1 Overview of This Report

When a large number of employees leave an organization, remaining employees are likely to struggle with taking on extra work, which can lead to burnout and potentially more resignations. Additionally, organizations need to invest time and resources in recruiting and training new employees to get them up to speed. Therefore, to effectively manage organizations and prevent a myriad of other problems stemming from regrettable employee attrition (i.e., high-performing employees leaving), it is crucial to use data-driven insights. The first step in preventing employee attrition is to identify factors that might be causing it by analyzing existing data.

This document walks through how a people scientist (people analyst) might use organizational data to identify factors contributing to employee attrition. The report is structured as follows:

  • 1.Modeling the Probability of Attrition

    • a.Data Pre-Preparation
    • b.Exploratory Data Analysis
    • c.Further Data Preparation
    • d.Model Building
  • 2.Outcome of the Employee Attrition Analysis

    • a.What factors are contributing to the high attrition?
    • b.which variable is the most important and needs to be addressed immediately?
    • c.What changes should the company make to their workplace to support better retention?
  • 3.Recommendations to Improve the Efficiency of the Company’s Data Collection & Analysis Process as a Data Strategist

    • a.The need for additional data collection and measure development
    • b.Adjustment of current scales for several measures
    • c.addition of key variables of attrition from current workplace research

2 1.Modeling the Probability of Attrition

Attrition is a binary outcome (i.e., staying within the company or leaving). This suggests that using a linear regression is likely to lead to misleading conclusions in terms of what factors influence employee attrition as it won’t fit the data well. Therefore, I will use Logistic Regression, which uses the sigmoid function, to properly model the probability of attrition. This section follows the following steps:

2.1 a.Data Pre-Preparation

  • check and install all necessary packages for smoother replications
# List of required packages
required_packages <- c("pROC", "broom", "car", "carData", "caTools", "ggplot2", "purrr", "dplyr", "Amelia", "Rcpp", "reader", "knitr", "ROSE", "caret")

# Check if each package is installed, if not, install it
for (pkg in required_packages) {
  if (!(pkg %in% installed.packages())) {
    install.packages(pkg, dependencies = TRUE)
  }
}

2.1.1 a.1.Load all the six datasets (tables)

library(readr)
employee_survey_data <- read_csv("src/employee_survey_data.csv")
general_data <- read_csv("src/general_data.csv")
in_time <- read_csv("src/in_time.csv")
manager_survey_data <- read_csv("src/manager_survey_data.csv")
out_time <- read_csv("src/out_time.csv")

2.1.2 a.2.Check the structure of each data frame

grasp the type of each variable, number of observations, categories, values

str(employee_survey_data)
print(head(employee_survey_data))
str(general_data)
print(head(general_data))
str(in_time)
print(head(in_time))
str(manager_survey_data)
print(head(manager_survey_data))
str(out_time)
print(head(out_time))

2.1.3 a.3.Check for duplicates in each data frame

Each row corresponds to a unique employee (no duplicates found).

sum(duplicated(employee_survey_data))
## [1] 0
sum(duplicated(general_data))
## [1] 0
sum(duplicated(in_time))
## [1] 0
sum(duplicated(manager_survey_data))
## [1] 0
sum(duplicated(out_time))
## [1] 0

2.1.4 a.4.Check the % of missing observations in each dataframe

The percentage of missing data in each data frame is not critical to the degree that they would significantly influence the results of the logistic regression model I plan to create.

  • employee_survey_data: the # of missing observations is close to 0%
  • general_data: the # of missing observations is close to 0%
  • manager_survey_data:The # of missing observations is 0%
  • in_time: the # of missing observations is 9%. The missing observation map plot suggests that missing observations are randomly distributed except for holidays (e.g., Janauary 1) or company’s closing days (consistent across employees).
  • Out_time: the # of missing observations is 9%. The missing map plot suggest that missing observations are randomly distributed except for holidays (e.g., January 14) or company’s closing days (consistent across employees).
  • Given that the % of missing observations in ‘in_time’ and ‘out_time’ data frames are relatively small and that they are randomly distributed, omitting these observations are not likely to significantly influence the results of my data analysis. Given that I plan to calculate the average work hour per day for each employee, these missing observations are not likely to significantly influence my outcome. Therefore, I leave them as they are.
library(Amelia) # use this package to identify missing observations for each dataframe. 
missmap(employee_survey_data, main = 'missing maps for employee_survey_data', col=c('yellow', 'black'), legend = TRUE)

missmap(general_data, main = 'missing maps for general_data', col=c('yellow', 'black'), legend = TRUE)

missmap(manager_survey_data, main = 'missing maps for manager_survey_data', col=c('yellow', 'black'), legend = TRUE)

missmap(in_time, main = 'missing maps for in_time data', col=c('yellow', 'black'), legend = TRUE)

missmap(out_time, main = 'missing maps for out_time data', col=c('yellow', 'black'), legend = TRUE)

2.1.5 a.5.Feature engineering & data cleaning

2.1.5.1 a.5.1.Calculate the average work hours for each employee per day using in_time & out_time tables

Given that we have in_time and out_time data for work days in 2015 for each employee, by using these two data frames, we can calculate an average work hours per day for each employee.

2.1.5.2 a.5.2.Provide correct column names for ’EmployeeID’in both in_time & out_time tables

Given that both tables’ first column, which provides each employee’s unique identifier, is missing the column name, I provide the identical column name ‘EmployeeID’ so that I can use this column later to merge all the tables (all the other data frames have a common column named ‘EmployeeID’.

colnames(in_time)[1] <- "EmployeeID"
colnames(out_time)[1] <- "EmployeeID"

2.1.5.3 a.5.3.Calculate work hours per employee

By using both the ‘out_time’ and ‘in_time’ data frames, we create a new data frame, which provides the length of work hours per employee for each work day.

employee_work_hours <- 
  cbind(out_time["EmployeeID"], out_time[, -1] - in_time[, -1])
print(head(employee_work_hours))
##   EmployeeID 2015-01-01      2015-01-02      2015-01-05      2015-01-06
## 1          1         NA  7.208333 hours  7.189722 hours  7.410833 hours
## 2          2         NA  8.109167 hours  7.454722 hours        NA hours
## 3          3         NA  6.692500 hours  7.265556 hours  6.405278 hours
## 4          4         NA  7.338333 hours  7.291944 hours  6.943056 hours
## 5          5         NA  8.055556 hours  7.988056 hours  7.682500 hours
## 6          6         NA 10.779444 hours 10.721944 hours 10.963611 hours
##        2015-01-07      2015-01-08      2015-01-09      2015-01-12
## 1  7.006667 hours  7.289722 hours  7.484444 hours  7.262778 hours
## 2  7.396944 hours  7.416667 hours  7.150833 hours  7.611389 hours
## 3  6.765000 hours  7.345000 hours  6.861389 hours  7.418611 hours
## 4  6.919444 hours  6.850833 hours  7.193056 hours  6.998611 hours
## 5  7.806111 hours  7.662222 hours  7.721667 hours  8.365000 hours
## 6 10.298611 hours 11.009444 hours 11.099722 hours 10.838056 hours
##        2015-01-13 2015-01-14      2015-01-15      2015-01-16      2015-01-19
## 1  7.831111 hours         NA  7.346944 hours  7.267500 hours        NA hours
## 2  7.278889 hours         NA  7.613056 hours  7.727500 hours  7.577500 hours
## 3  6.999722 hours         NA  7.438333 hours  7.210278 hours  7.072222 hours
## 4  7.306389 hours         NA  6.876667 hours  6.907778 hours  6.518611 hours
## 5  8.257222 hours         NA  8.260000 hours  8.403611 hours        NA hours
## 6 10.270000 hours         NA 10.651111 hours 10.921944 hours 11.352500 hours
##        2015-01-20      2015-01-21      2015-01-22      2015-01-23 2015-01-26
## 1  6.775833 hours  7.095000 hours  7.050556 hours  7.604722 hours         NA
## 2  7.602778 hours  7.905556 hours  7.376389 hours  7.778889 hours         NA
## 3  6.920556 hours  6.802778 hours  7.478611 hours  6.921667 hours         NA
## 4  7.178889 hours  6.705000 hours  7.011389 hours  7.145833 hours         NA
## 5  7.815278 hours  8.223056 hours  8.290833 hours  7.310278 hours         NA
## 6 10.521389 hours 10.992500 hours 11.111111 hours 10.829167 hours         NA
##       2015-01-27      2015-01-28      2015-01-29      2015-01-30
## 1 7.629167 hours  7.118889 hours  7.413611 hours  6.849722 hours
## 2 7.467222 hours  7.189722 hours  7.259444 hours  7.059167 hours
## 3 7.249167 hours  6.305833 hours  7.221944 hours  7.024167 hours
## 4 7.254722 hours  7.545278 hours  6.909167 hours  7.219722 hours
## 5 7.928889 hours  7.928611 hours  8.079167 hours  8.028611 hours
## 6       NA hours 10.883611 hours 11.044444 hours 10.577500 hours
##        2015-02-02     2015-02-03      2015-02-04      2015-02-05
## 1  6.901667 hours 7.203056 hours  7.605278 hours  7.565278 hours
## 2  7.631111 hours 7.632500 hours  7.644167 hours  7.637222 hours
## 3  7.071111 hours 6.520278 hours  7.364167 hours  6.358333 hours
## 4  7.067778 hours 7.436111 hours  6.977222 hours  7.151389 hours
## 5  8.165000 hours 8.028889 hours  7.996667 hours  8.090556 hours
## 6 10.555000 hours       NA hours 10.856667 hours 11.012500 hours
##        2015-02-06      2015-02-09      2015-02-10      2015-02-11
## 1  7.470000 hours  7.601389 hours  7.267222 hours  7.193056 hours
## 2  7.900833 hours        NA hours  8.034444 hours  7.637778 hours
## 3  6.906667 hours  6.659722 hours  6.825278 hours  6.785833 hours
## 4  7.145278 hours  7.333889 hours        NA hours  7.060833 hours
## 5  7.708611 hours  8.186667 hours  7.678889 hours  7.963611 hours
## 6 10.803056 hours 10.911667 hours 11.298889 hours 11.338889 hours
##        2015-02-12      2015-02-13      2015-02-16      2015-02-17
## 1  7.435278 hours  7.205278 hours  7.605833 hours  7.416111 hours
## 2  7.964444 hours  7.733333 hours  7.699167 hours  7.406944 hours
## 3  6.455833 hours  7.144722 hours  7.022778 hours  6.685278 hours
## 4  7.341111 hours  7.145556 hours  7.350278 hours  7.110556 hours
## 5  7.670833 hours  8.720556 hours  7.922222 hours  7.902500 hours
## 6 10.548611 hours 10.965000 hours 10.961111 hours 11.320278 hours
##        2015-02-18      2015-02-19      2015-02-20      2015-02-23
## 1  7.839722 hours        NA hours  7.832222 hours  7.441111 hours
## 2  7.903889 hours  7.912778 hours  7.424167 hours  8.255556 hours
## 3  7.046944 hours  6.960278 hours  6.886389 hours  7.754722 hours
## 4  6.872500 hours  7.523611 hours  6.697778 hours  7.179722 hours
## 5  8.094167 hours  7.846111 hours  8.230833 hours  7.802778 hours
## 6 10.732500 hours 10.705000 hours 10.732500 hours 10.641667 hours
##        2015-02-24      2015-02-25      2015-02-26      2015-02-27
## 1  7.225000 hours  7.583611 hours  7.200833 hours  7.385833 hours
## 2  7.800000 hours  7.752222 hours  7.846667 hours  8.120278 hours
## 3  6.751667 hours  6.940278 hours  7.553056 hours  6.855000 hours
## 4  7.064167 hours  6.280556 hours  7.358056 hours  7.263889 hours
## 5  8.118889 hours  8.032222 hours  7.709722 hours  7.967222 hours
## 6 11.171111 hours 10.961667 hours 10.946944 hours 10.714444 hours
##        2015-03-02      2015-03-03      2015-03-04 2015-03-05      2015-03-06
## 1  7.156944 hours  6.759722 hours  7.744444 hours         NA  7.815000 hours
## 2  7.681667 hours  7.693611 hours  7.927500 hours         NA  7.339722 hours
## 3  7.493056 hours  7.054444 hours  7.188611 hours         NA  6.771111 hours
## 4        NA hours  7.403333 hours  6.463611 hours         NA        NA hours
## 5  8.693056 hours  7.950278 hours  8.072778 hours         NA  8.182778 hours
## 6 10.339444 hours 11.358889 hours 10.933889 hours         NA 10.573333 hours
##        2015-03-09      2015-03-10      2015-03-11      2015-03-12
## 1  7.408611 hours  6.923056 hours  7.161111 hours  7.080000 hours
## 2  8.130000 hours  7.690556 hours  7.986667 hours  7.522500 hours
## 3  6.846944 hours  6.873056 hours  7.079722 hours  7.275556 hours
## 4  7.157222 hours  7.397222 hours  7.493333 hours  7.624167 hours
## 5  8.356389 hours  8.064722 hours  8.143611 hours  8.083611 hours
## 6 10.835556 hours 10.422778 hours 10.864167 hours 10.755000 hours
##        2015-03-13      2015-03-16      2015-03-17      2015-03-18
## 1  7.310278 hours        NA hours  6.915000 hours  7.197778 hours
## 2  7.819722 hours  7.205278 hours  7.365000 hours  7.267500 hours
## 3  6.821389 hours  6.760556 hours  7.120833 hours  7.316667 hours
## 4  6.817222 hours  6.724444 hours  7.208333 hours  7.109722 hours
## 5  8.051944 hours  7.789722 hours  7.950278 hours  8.106111 hours
## 6 10.253333 hours 11.081667 hours 10.780833 hours 10.656111 hours
##        2015-03-19      2015-03-20      2015-03-23      2015-03-24
## 1  7.847778 hours  7.162500 hours  7.616389 hours  7.251944 hours
## 2  7.697500 hours  7.430278 hours  7.948056 hours        NA hours
## 3  6.545556 hours        NA hours  6.767500 hours  6.900278 hours
## 4  7.006389 hours  7.206389 hours  6.947778 hours  6.926667 hours
## 5  7.740000 hours  7.769444 hours  7.674722 hours  7.675278 hours
## 6 10.462222 hours 10.928889 hours 10.791389 hours 10.821111 hours
##        2015-03-25      2015-03-26      2015-03-27      2015-03-30
## 1  7.445556 hours        NA hours  7.555833 hours  7.356944 hours
## 2  7.320833 hours  7.925556 hours  7.774167 hours  7.401944 hours
## 3  6.981944 hours  7.418333 hours        NA hours  6.714444 hours
## 4  7.346111 hours  7.812778 hours        NA hours  7.042222 hours
## 5  7.841667 hours  7.784722 hours  7.898333 hours  7.699167 hours
## 6 11.006389 hours 10.972778 hours 10.854444 hours 11.045556 hours
##        2015-03-31      2015-04-01      2015-04-02      2015-04-03
## 1  7.865000 hours  7.336944 hours  7.658611 hours  7.187778 hours
## 2  7.972778 hours  7.728333 hours  7.753889 hours  7.617500 hours
## 3  7.478333 hours  6.911944 hours  6.833333 hours  6.748333 hours
## 4  7.613333 hours  7.509722 hours  6.913889 hours  7.069444 hours
## 5  7.803889 hours  8.037500 hours  8.101667 hours  7.827778 hours
## 6 11.513611 hours 10.860278 hours 10.090278 hours 11.047222 hours
##        2015-04-06      2015-04-07      2015-04-08      2015-04-09
## 1  7.040833 hours  7.640000 hours  7.427222 hours  7.803056 hours
## 2  7.371389 hours  7.473333 hours  7.383056 hours  7.087778 hours
## 3  6.905278 hours  7.235000 hours  6.960278 hours  7.461667 hours
## 4  7.398056 hours  7.055556 hours  7.219444 hours  7.050833 hours
## 5  8.355556 hours  8.037500 hours  7.581944 hours  8.145556 hours
## 6 10.895833 hours 10.772222 hours 10.592222 hours 11.588056 hours
##        2015-04-10      2015-04-13      2015-04-14      2015-04-15
## 1  7.115833 hours  7.348056 hours  7.146667 hours  7.459722 hours
## 2  7.921111 hours  8.192778 hours  8.174167 hours  7.475000 hours
## 3  6.755278 hours  7.265556 hours  7.233056 hours  7.270833 hours
## 4  6.971944 hours  6.834167 hours        NA hours  6.495556 hours
## 5  7.946667 hours  7.871389 hours  8.528056 hours  8.073333 hours
## 6 11.322500 hours 11.540556 hours 10.964167 hours 11.267222 hours
##        2015-04-16      2015-04-17      2015-04-20      2015-04-21
## 1  7.756944 hours  7.284722 hours  7.695278 hours  6.975556 hours
## 2  7.238333 hours  7.738889 hours  7.784167 hours  8.466667 hours
## 3  7.060556 hours  7.611667 hours  7.449444 hours  7.140833 hours
## 4  7.079167 hours  6.984444 hours  7.305833 hours  7.077778 hours
## 5  8.168333 hours  7.878889 hours  8.321111 hours        NA hours
## 6 10.613889 hours 11.086111 hours 11.164722 hours 11.021389 hours
##        2015-04-22      2015-04-23      2015-04-24      2015-04-27
## 1  7.525000 hours  7.336389 hours  7.562500 hours  7.241944 hours
## 2  8.003611 hours  7.958056 hours  7.532500 hours  7.639167 hours
## 3  7.446111 hours        NA hours  6.914444 hours  7.002778 hours
## 4  7.257222 hours  7.636944 hours  7.443611 hours  7.374167 hours
## 5  7.492500 hours  7.830278 hours  7.842222 hours  8.670556 hours
## 6 10.871111 hours 10.630833 hours 10.824444 hours 10.830278 hours
##        2015-04-28      2015-04-29      2015-04-30 2015-05-01     2015-05-04
## 1  7.889167 hours  7.691667 hours  7.508056 hours         NA 7.410556 hours
## 2  7.606667 hours        NA hours  8.081944 hours         NA 7.793056 hours
## 3  7.343056 hours  6.963611 hours  7.050556 hours         NA 6.787500 hours
## 4  6.926944 hours  7.144167 hours  7.120833 hours         NA 7.503889 hours
## 5  7.493056 hours  7.647778 hours  7.785000 hours         NA 7.910833 hours
## 6 10.613611 hours 10.101667 hours 11.000278 hours         NA 9.971944 hours
##        2015-05-05      2015-05-06      2015-05-07      2015-05-08
## 1  7.308611 hours  7.456667 hours  7.062778 hours  7.615000 hours
## 2  7.718056 hours  7.858056 hours  7.608333 hours  7.179167 hours
## 3  6.330278 hours  7.111667 hours  6.872500 hours  6.871389 hours
## 4  6.727222 hours  7.843889 hours  7.156389 hours  7.714444 hours
## 5  8.336667 hours  8.553889 hours  7.362500 hours  7.673889 hours
## 6 11.514722 hours 10.658056 hours 10.495556 hours 10.094167 hours
##       2015-05-11      2015-05-12      2015-05-13      2015-05-14
## 1       NA hours  7.444722 hours  7.502500 hours  7.901667 hours
## 2 7.526111 hours  7.917222 hours  7.480278 hours        NA hours
## 3 7.032500 hours  7.101944 hours  6.831111 hours  7.276111 hours
## 4 7.134722 hours  7.274444 hours  7.174722 hours  7.263333 hours
## 5 7.791111 hours  8.631111 hours  7.473611 hours  7.964167 hours
## 6       NA hours 11.448889 hours 10.783056 hours 11.020000 hours
##        2015-05-15      2015-05-18      2015-05-19      2015-05-20
## 1  6.816944 hours        NA hours  7.333056 hours  7.319444 hours
## 2  7.397500 hours        NA hours  8.082222 hours  7.204167 hours
## 3  6.610278 hours  7.237778 hours  7.414167 hours  7.436667 hours
## 4  7.146389 hours  6.886944 hours  7.508611 hours  6.623056 hours
## 5  7.811667 hours  7.521389 hours  8.317222 hours  7.938889 hours
## 6 10.526944 hours 10.592222 hours 11.003333 hours 10.936944 hours
##        2015-05-21      2015-05-22      2015-05-25     2015-05-26
## 1  7.465278 hours  7.128889 hours  7.573056 hours 7.199722 hours
## 2  7.714722 hours  8.022778 hours  8.079722 hours 8.119167 hours
## 3  7.116111 hours  6.832778 hours  7.280000 hours 6.265833 hours
## 4  7.210278 hours        NA hours  7.212500 hours 7.373056 hours
## 5  7.855000 hours  8.155000 hours  8.644722 hours 8.501111 hours
## 6 10.742778 hours 10.750833 hours 10.698333 hours       NA hours
##        2015-05-27      2015-05-28      2015-05-29      2015-06-01
## 1  7.877778 hours  7.447778 hours        NA hours        NA hours
## 2  7.968056 hours  7.277778 hours  8.052778 hours  8.107222 hours
## 3  7.151389 hours  6.307222 hours  6.743611 hours  6.788889 hours
## 4  7.582500 hours  7.667778 hours  7.114444 hours  6.965556 hours
## 5  8.113333 hours  7.747778 hours  8.319167 hours  8.061944 hours
## 6 11.175278 hours 10.688889 hours 10.705000 hours 10.790278 hours
##        2015-06-02      2015-06-03      2015-06-04      2015-06-05
## 1  7.717222 hours  7.514167 hours  7.041944 hours        NA hours
## 2  7.616389 hours  7.793611 hours  6.920833 hours  7.664167 hours
## 3  7.177500 hours  7.313611 hours  6.813889 hours  6.684444 hours
## 4  7.801667 hours  7.316667 hours  7.451389 hours  6.787222 hours
## 5  7.799167 hours  8.192222 hours  7.627222 hours  8.221667 hours
## 6 10.358889 hours 11.225833 hours 10.940833 hours 10.305833 hours
##        2015-06-08      2015-06-09      2015-06-10      2015-06-11
## 1  7.072778 hours        NA hours  7.075278 hours  7.575000 hours
## 2  7.583611 hours  7.429722 hours  8.091944 hours  7.980278 hours
## 3  7.640556 hours  6.933889 hours  7.336389 hours  7.211111 hours
## 4  7.283056 hours  7.201944 hours  7.254444 hours  7.433333 hours
## 5  8.157500 hours  7.806944 hours  7.814167 hours        NA hours
## 6 10.858056 hours 11.035000 hours 10.781389 hours 10.803611 hours
##        2015-06-12      2015-06-15      2015-06-16      2015-06-17
## 1  7.421389 hours  7.595278 hours  7.542222 hours  7.526111 hours
## 2        NA hours  8.050278 hours  7.766944 hours  8.225556 hours
## 3  7.119722 hours  7.097222 hours  6.915278 hours  6.994167 hours
## 4  7.121111 hours  7.332778 hours  6.905278 hours  7.268056 hours
## 5  8.292778 hours  8.129722 hours  8.081667 hours  8.349722 hours
## 6 11.107222 hours 10.643333 hours 10.984444 hours 10.983611 hours
##        2015-06-18      2015-06-19      2015-06-22      2015-06-23
## 1  7.521389 hours  7.248889 hours  7.178333 hours  6.912778 hours
## 2  7.883611 hours  7.706389 hours  8.000833 hours  7.793889 hours
## 3  7.258889 hours  6.940278 hours  6.616667 hours  7.142778 hours
## 4  7.751944 hours  7.298333 hours  7.568889 hours  6.837778 hours
## 5  8.486111 hours  7.611389 hours  8.705556 hours  7.821111 hours
## 6 10.922778 hours 10.819722 hours 11.068611 hours 10.843889 hours
##        2015-06-24      2015-06-25      2015-06-26     2015-06-29     2015-06-30
## 1  7.112778 hours  7.753611 hours  7.052222 hours 7.661667 hours 7.303056 hours
## 2  7.528611 hours  7.945833 hours  7.679722 hours 7.968333 hours 7.111389 hours
## 3        NA hours  6.570833 hours  7.057222 hours 6.770556 hours 6.589444 hours
## 4  7.466111 hours  6.956389 hours  7.136944 hours 7.152778 hours 7.199444 hours
## 5  7.549722 hours  7.900833 hours  8.241389 hours 7.941111 hours 8.116944 hours
## 6 10.275833 hours 10.806389 hours 10.813333 hours 9.874444 hours       NA hours
##       2015-07-01     2015-07-02      2015-07-03      2015-07-06      2015-07-07
## 1 7.723611 hours 7.618333 hours  7.158889 hours  7.947500 hours  7.651389 hours
## 2 7.700278 hours 7.807778 hours  8.143889 hours  7.719444 hours  7.848611 hours
## 3 6.716389 hours 6.851944 hours  6.928611 hours  7.966389 hours  6.763056 hours
## 4 6.865000 hours 7.371389 hours  7.507500 hours  7.376667 hours  7.554444 hours
## 5 7.708611 hours 7.978333 hours  7.599444 hours  7.500556 hours  7.925556 hours
## 6       NA hours       NA hours 10.939722 hours 10.678333 hours 10.785833 hours
##        2015-07-08      2015-07-09      2015-07-10      2015-07-13
## 1  7.492500 hours  6.936944 hours  7.368611 hours  7.276389 hours
## 2  8.022778 hours  7.107500 hours  7.703611 hours        NA hours
## 3  6.859167 hours  7.993611 hours  7.049167 hours  6.348889 hours
## 4  6.947222 hours  7.275000 hours  7.613889 hours  7.377778 hours
## 5  7.816944 hours  7.933889 hours  8.256389 hours  7.933889 hours
## 6 10.815000 hours 10.886667 hours 10.595278 hours 10.921389 hours
##        2015-07-14      2015-07-15      2015-07-16 2015-07-17      2015-07-20
## 1  7.127778 hours  7.624444 hours  7.273889 hours         NA  7.087222 hours
## 2  7.511111 hours  7.642500 hours  8.243056 hours         NA  7.483056 hours
## 3  7.394444 hours  7.206111 hours  6.935556 hours         NA  6.920833 hours
## 4  7.753889 hours  7.482500 hours  7.246389 hours         NA  6.864722 hours
## 5  7.827500 hours  8.157778 hours  7.786944 hours         NA  7.679722 hours
## 6 10.794167 hours 10.084444 hours 10.641667 hours         NA 10.871111 hours
##        2015-07-21      2015-07-22     2015-07-23      2015-07-24
## 1  7.253889 hours  7.878889 hours 6.681667 hours  7.480556 hours
## 2  7.643889 hours  7.806944 hours 7.933889 hours  7.861944 hours
## 3  7.205833 hours  6.644444 hours 6.942778 hours  6.858056 hours
## 4  6.827222 hours        NA hours 7.406667 hours  7.393333 hours
## 5  8.323889 hours  7.710556 hours 7.768889 hours  8.127778 hours
## 6 10.828056 hours 10.680000 hours       NA hours 10.104167 hours
##        2015-07-27      2015-07-28      2015-07-29      2015-07-30
## 1  7.473333 hours  6.705000 hours  7.148611 hours  7.232222 hours
## 2        NA hours  7.871389 hours  7.937778 hours  7.377778 hours
## 3  7.115278 hours  7.728056 hours  7.393611 hours  6.978889 hours
## 4  7.243333 hours  7.011667 hours  7.852500 hours  7.231944 hours
## 5  8.406667 hours  7.631667 hours  7.670556 hours  8.376667 hours
## 6 10.835556 hours 11.085833 hours 10.778611 hours 10.651111 hours
##        2015-07-31      2015-08-03      2015-08-04      2015-08-05
## 1  7.380556 hours  6.949444 hours  7.178889 hours  7.674722 hours
## 2  7.253056 hours  7.775000 hours  7.854167 hours  7.812778 hours
## 3  7.211389 hours  7.045556 hours  6.802222 hours        NA hours
## 4  7.274444 hours  7.653333 hours  7.283889 hours        NA hours
## 5  8.009167 hours  7.747222 hours  8.297500 hours  7.867500 hours
## 6 10.392778 hours 10.908611 hours 10.787778 hours 10.731667 hours
##        2015-08-06      2015-08-07      2015-08-10      2015-08-11
## 1  7.505556 hours  7.424722 hours  7.603889 hours  7.525556 hours
## 2  8.544722 hours  7.925278 hours  7.775000 hours  7.255556 hours
## 3  6.884444 hours  6.912222 hours  7.247500 hours  6.776389 hours
## 4  7.049722 hours  7.400000 hours  7.438056 hours  7.655556 hours
## 5  7.604722 hours  7.801111 hours  7.874444 hours  8.356111 hours
## 6 10.765833 hours 11.457222 hours 10.838611 hours 10.496667 hours
##        2015-08-12      2015-08-13      2015-08-14      2015-08-17
## 1  7.610833 hours  7.368889 hours  7.141667 hours  7.055556 hours
## 2  7.762778 hours  8.066389 hours  7.845556 hours  8.197222 hours
## 3  7.043333 hours  6.749167 hours  7.048333 hours  6.897778 hours
## 4  7.175556 hours  6.774722 hours  7.415556 hours  7.147778 hours
## 5        NA hours  8.381111 hours  8.078056 hours  8.205833 hours
## 6 10.363611 hours 10.543056 hours 10.809167 hours 10.833889 hours
##        2015-08-18      2015-08-19      2015-08-20      2015-08-21
## 1  7.628333 hours  6.920833 hours  7.219167 hours  7.730833 hours
## 2  7.487500 hours  7.893611 hours  7.662500 hours  7.526944 hours
## 3  7.293056 hours  7.033333 hours  6.747778 hours  7.011389 hours
## 4  7.672222 hours  7.311667 hours  7.275278 hours  6.842778 hours
## 5  7.552500 hours  8.276944 hours  7.889444 hours  8.049167 hours
## 6 10.802500 hours 10.704444 hours 11.118056 hours 10.690000 hours
##        2015-08-24      2015-08-25      2015-08-26      2015-08-27
## 1  7.488611 hours  7.416667 hours  7.703611 hours  7.688889 hours
## 2  7.476111 hours  7.477778 hours  7.490278 hours  7.893056 hours
## 3  6.945556 hours  6.703611 hours  6.656111 hours  6.946944 hours
## 4        NA hours  7.174444 hours  7.088889 hours  6.673056 hours
## 5  8.248611 hours  8.426667 hours  8.135000 hours  7.775278 hours
## 6 11.149722 hours 11.010556 hours 10.500833 hours 10.540556 hours
##        2015-08-28      2015-08-31      2015-09-01      2015-09-02
## 1  7.554722 hours  7.003333 hours        NA hours  7.187222 hours
## 2  7.567778 hours  7.836667 hours  6.908056 hours  7.498889 hours
## 3  6.650278 hours  6.740278 hours  7.345000 hours  7.236389 hours
## 4        NA hours  6.984167 hours  7.473889 hours  6.990278 hours
## 5  7.693611 hours  7.479444 hours  7.674444 hours  8.348333 hours
## 6 10.608889 hours 10.841389 hours 11.116667 hours 10.938889 hours
##        2015-09-03      2015-09-04      2015-09-07      2015-09-08
## 1  7.433611 hours  7.529444 hours  7.383333 hours  7.344167 hours
## 2  7.909167 hours  7.812500 hours  8.193333 hours        NA hours
## 3  7.146111 hours  7.080833 hours  6.825556 hours  7.141944 hours
## 4  7.294167 hours  7.092500 hours  7.042500 hours  7.264167 hours
## 5  8.828333 hours  8.112222 hours  7.809722 hours  7.931111 hours
## 6 11.167778 hours 10.677222 hours 10.626111 hours 10.590278 hours
##        2015-09-09      2015-09-10     2015-09-11     2015-09-14      2015-09-15
## 1  7.530000 hours  7.376111 hours 7.334167 hours 6.868333 hours  7.133889 hours
## 2  7.578611 hours  7.891389 hours 7.484722 hours 7.457778 hours  7.414444 hours
## 3  6.799722 hours  7.541389 hours 6.591944 hours 7.362500 hours  6.785000 hours
## 4  7.348056 hours  7.286111 hours 7.258611 hours 7.890833 hours  6.807778 hours
## 5  8.090000 hours  8.403333 hours 7.760556 hours 8.151944 hours  7.786389 hours
## 6 10.833333 hours 10.875278 hours       NA hours       NA hours 10.601389 hours
##        2015-09-16 2015-09-17      2015-09-18      2015-09-21      2015-09-22
## 1  7.333333 hours         NA  7.452222 hours  7.430556 hours  7.653889 hours
## 2  7.835556 hours         NA  7.404444 hours  7.703611 hours  7.265000 hours
## 3  6.935833 hours         NA  6.983333 hours  7.217222 hours  6.883611 hours
## 4  7.084167 hours         NA  7.516389 hours  7.115556 hours  6.987500 hours
## 5  7.784722 hours         NA  7.585833 hours  7.846111 hours  7.936389 hours
## 6 10.642222 hours         NA 10.629722 hours 10.520833 hours 11.008889 hours
##        2015-09-23      2015-09-24      2015-09-25      2015-09-28
## 1  7.043889 hours  7.320556 hours  7.732222 hours  7.644444 hours
## 2  7.701944 hours  8.083889 hours  8.139444 hours  7.696111 hours
## 3  7.164167 hours  7.035000 hours  6.977500 hours  7.811944 hours
## 4  7.070000 hours  7.589722 hours        NA hours  7.010833 hours
## 5  8.302778 hours  8.214444 hours  8.559444 hours  8.186944 hours
## 6 11.003889 hours 10.698611 hours 10.634722 hours 11.098056 hours
##        2015-09-29      2015-09-30      2015-10-01 2015-10-02      2015-10-05
## 1  7.592222 hours  7.367222 hours  7.787778 hours         NA  7.028056 hours
## 2  8.029722 hours  7.871667 hours  7.423333 hours         NA  8.099444 hours
## 3  7.439444 hours  6.759722 hours  7.292500 hours         NA  7.019722 hours
## 4  6.915278 hours  6.854167 hours  6.893889 hours         NA        NA hours
## 5  8.276944 hours  7.825278 hours  7.650833 hours         NA  8.505833 hours
## 6 11.173611 hours 11.058889 hours 10.725833 hours         NA 10.503056 hours
##        2015-10-06      2015-10-07      2015-10-08      2015-10-09
## 1  7.680833 hours  7.398611 hours  7.488056 hours  7.605556 hours
## 2  7.060000 hours  7.949167 hours  8.250833 hours  7.631111 hours
## 3  7.517778 hours  7.162222 hours  6.931389 hours  6.916111 hours
## 4  7.171667 hours  7.500278 hours  7.498056 hours  6.870000 hours
## 5  7.803056 hours  7.663056 hours  7.987500 hours  7.668611 hours
## 6 10.499444 hours 11.241667 hours 10.534444 hours 10.392222 hours
##        2015-10-12      2015-10-13      2015-10-14     2015-10-15
## 1  7.160556 hours        NA hours  7.488611 hours 7.572778 hours
## 2  7.089167 hours  8.302500 hours  7.571111 hours       NA hours
## 3  7.086111 hours  6.692222 hours  7.083611 hours 6.791389 hours
## 4  7.283889 hours  7.112778 hours  6.915833 hours 7.170833 hours
## 5  8.666667 hours  7.949722 hours  7.885556 hours 8.123333 hours
## 6 10.452222 hours 11.743889 hours 10.311111 hours 9.815278 hours
##        2015-10-16      2015-10-19      2015-10-20      2015-10-21
## 1  6.821944 hours  7.547222 hours  7.038333 hours        NA hours
## 2  7.631111 hours  7.941111 hours  7.571667 hours  7.078056 hours
## 3  7.015000 hours  6.554444 hours  6.970833 hours  7.002500 hours
## 4  6.800833 hours  7.098889 hours  7.488333 hours  7.048889 hours
## 5  7.808889 hours  7.615833 hours  8.500833 hours  8.316111 hours
## 6 10.243889 hours 10.991944 hours 10.487500 hours 10.817222 hours
##        2015-10-22      2015-10-23      2015-10-26      2015-10-27
## 1        NA hours  7.170278 hours  7.746111 hours  7.050833 hours
## 2  8.558889 hours  6.725278 hours  8.149444 hours  7.910000 hours
## 3  7.023333 hours  6.868611 hours  7.248611 hours  7.526944 hours
## 4  6.793611 hours  7.508889 hours  7.433333 hours  6.893056 hours
## 5  8.319722 hours  8.100556 hours  8.264167 hours  8.315833 hours
## 6 11.154722 hours 10.815278 hours 11.028333 hours 11.034167 hours
##        2015-10-28      2015-10-29      2015-10-30      2015-11-02
## 1  7.703333 hours  7.118611 hours  7.313056 hours  6.862500 hours
## 2  7.547222 hours  7.868333 hours  7.753056 hours  7.408889 hours
## 3  6.460556 hours  6.803889 hours  6.629722 hours  7.690278 hours
## 4  7.063611 hours  7.059167 hours        NA hours  7.076389 hours
## 5  7.985833 hours  8.081111 hours  7.587500 hours  8.409722 hours
## 6 10.645833 hours 10.972222 hours 10.715833 hours 10.730833 hours
##        2015-11-03      2015-11-04      2015-11-05      2015-11-06 2015-11-09
## 1  7.594722 hours  6.944444 hours  7.638333 hours  6.995833 hours         NA
## 2  7.706667 hours  7.932778 hours  7.363611 hours  7.433333 hours         NA
## 3  7.042222 hours  6.703333 hours  6.687500 hours  6.555556 hours         NA
## 4  7.244444 hours  6.825556 hours  7.513056 hours        NA hours         NA
## 5  7.522778 hours  8.116667 hours  8.128056 hours  8.208889 hours         NA
## 6 11.188889 hours 10.893333 hours 11.070000 hours 10.202500 hours         NA
##   2015-11-10 2015-11-11      2015-11-12      2015-11-13      2015-11-16
## 1         NA         NA  7.178611 hours  7.327222 hours  7.785000 hours
## 2         NA         NA  7.654444 hours  8.204167 hours  8.250833 hours
## 3         NA         NA  7.365833 hours  6.995833 hours  6.801944 hours
## 4         NA         NA  7.605278 hours  6.735833 hours  7.016389 hours
## 5         NA         NA  8.348611 hours  7.865278 hours  7.953611 hours
## 6         NA         NA 10.791389 hours 10.323333 hours 10.565000 hours
##        2015-11-17      2015-11-18      2015-11-19      2015-11-20
## 1  7.095278 hours  7.410556 hours  6.994167 hours  7.818056 hours
## 2  8.001389 hours  7.928889 hours        NA hours  7.860833 hours
## 3  7.293056 hours  7.269167 hours  6.198333 hours  7.560833 hours
## 4  6.591667 hours  7.284444 hours  7.596111 hours  7.538611 hours
## 5  7.905556 hours  7.936111 hours  7.603333 hours  8.563889 hours
## 6 10.683611 hours 10.765278 hours 10.932222 hours 10.420833 hours
##        2015-11-23      2015-11-24      2015-11-25      2015-11-26
## 1  7.318056 hours  7.587222 hours  7.416667 hours  7.434444 hours
## 2  7.692222 hours  7.698889 hours  7.520000 hours  7.903889 hours
## 3  6.943889 hours  6.664167 hours  7.063611 hours  7.457222 hours
## 4  7.132778 hours  7.808889 hours  7.133611 hours  6.899167 hours
## 5  8.445278 hours  8.543056 hours  7.890556 hours  7.627778 hours
## 6 10.588056 hours 10.492222 hours 10.784722 hours 10.779722 hours
##        2015-11-27      2015-11-30      2015-12-01      2015-12-02
## 1  7.086944 hours  7.013056 hours  7.365000 hours  7.575556 hours
## 2  7.954722 hours  7.531667 hours  7.412500 hours  8.117500 hours
## 3        NA hours  6.977778 hours  6.936944 hours  6.737500 hours
## 4  6.950556 hours  7.310000 hours  7.278611 hours  7.068611 hours
## 5  8.122222 hours  8.436944 hours  7.911111 hours  7.400000 hours
## 6 10.809444 hours 10.546944 hours 11.018333 hours 10.418889 hours
##        2015-12-03      2015-12-04      2015-12-07      2015-12-08
## 1  7.584444 hours  7.120278 hours  6.889444 hours  7.940833 hours
## 2  8.039722 hours  7.982778 hours  7.508889 hours  7.715833 hours
## 3  6.766944 hours  7.068611 hours  6.798611 hours  7.490278 hours
## 4  7.212778 hours  6.912222 hours  7.404722 hours  7.281944 hours
## 5  7.666111 hours  7.868333 hours  8.103889 hours  8.535000 hours
## 6 10.910556 hours 10.671389 hours 10.816111 hours 10.433611 hours
##        2015-12-09      2015-12-10      2015-12-11     2015-12-14
## 1  7.948333 hours  7.196944 hours  7.651944 hours 7.538889 hours
## 2  7.973333 hours  7.821667 hours  7.458056 hours 7.536111 hours
## 3  7.535000 hours        NA hours  7.199444 hours 7.096389 hours
## 4  6.877500 hours  7.513056 hours  6.966667 hours 7.055833 hours
## 5  7.854167 hours  7.921111 hours  8.695833 hours 7.589444 hours
## 6 10.858333 hours 10.562500 hours 11.368889 hours       NA hours
##        2015-12-15      2015-12-16      2015-12-17      2015-12-18
## 1        NA hours  7.551944 hours        NA hours        NA hours
## 2  7.435278 hours  8.255278 hours  7.928333 hours  7.903056 hours
## 3  6.751111 hours  6.902778 hours  7.218333 hours  6.785833 hours
## 4  6.999444 hours  7.313056 hours  7.455833 hours  7.629167 hours
## 5  8.193056 hours  8.129444 hours  8.320000 hours  7.903611 hours
## 6 10.821389 hours 10.720000 hours 10.731667 hours 10.564444 hours
##        2015-12-21      2015-12-22      2015-12-23      2015-12-24 2015-12-25
## 1  7.339167 hours  7.395833 hours  6.504722 hours  7.596389 hours         NA
## 2  7.753889 hours  7.712222 hours  7.435556 hours        NA hours         NA
## 3  7.163611 hours  6.801667 hours  6.730278 hours  6.849722 hours         NA
## 4  6.846667 hours  7.326389 hours  7.413611 hours  7.085000 hours         NA
## 5  7.665000 hours  7.957500 hours  7.786944 hours  8.249444 hours         NA
## 6 11.067222 hours 11.145833 hours 10.960833 hours 10.467778 hours         NA
##       2015-12-28      2015-12-29      2015-12-30      2015-12-31
## 1 7.773889 hours  7.315000 hours  7.778889 hours  7.080278 hours
## 2 7.614722 hours  7.982500 hours  7.986111 hours  8.227222 hours
## 3 7.023889 hours  7.438889 hours  7.538889 hours  6.786389 hours
## 4 7.447222 hours  7.416667 hours  7.366389 hours  7.133056 hours
## 5 7.662222 hours  8.268611 hours  7.953333 hours  8.018056 hours
## 6       NA hours 10.893611 hours 10.897222 hours 10.838333 hours

2.1.5.4 a.5.4.Calculate average work hours for each employee

Now, using the ‘employee_work_hours’ data frame, I create a new variable named ‘employee_avg_work_hours’ per day in 2015 for each employee.

#given that each employee's work hour per day is currently classified as characters, we need to change it into numeric data to calculate an average of these hours.
library(dplyr)
employee_work_hours <- mutate(employee_work_hours, across(-EmployeeID, as.numeric))

print(head(employee_work_hours))
##   EmployeeID 2015-01-01 2015-01-02 2015-01-05 2015-01-06 2015-01-07 2015-01-08
## 1          1         NA   7.208333   7.189722   7.410833   7.006667   7.289722
## 2          2         NA   8.109167   7.454722         NA   7.396944   7.416667
## 3          3         NA   6.692500   7.265556   6.405278   6.765000   7.345000
## 4          4         NA   7.338333   7.291944   6.943056   6.919444   6.850833
## 5          5         NA   8.055556   7.988056   7.682500   7.806111   7.662222
## 6          6         NA  10.779444  10.721944  10.963611  10.298611  11.009444
##   2015-01-09 2015-01-12 2015-01-13 2015-01-14 2015-01-15 2015-01-16 2015-01-19
## 1   7.484444   7.262778   7.831111         NA   7.346944   7.267500         NA
## 2   7.150833   7.611389   7.278889         NA   7.613056   7.727500   7.577500
## 3   6.861389   7.418611   6.999722         NA   7.438333   7.210278   7.072222
## 4   7.193056   6.998611   7.306389         NA   6.876667   6.907778   6.518611
## 5   7.721667   8.365000   8.257222         NA   8.260000   8.403611         NA
## 6  11.099722  10.838056  10.270000         NA  10.651111  10.921944  11.352500
##   2015-01-20 2015-01-21 2015-01-22 2015-01-23 2015-01-26 2015-01-27 2015-01-28
## 1   6.775833   7.095000   7.050556   7.604722         NA   7.629167   7.118889
## 2   7.602778   7.905556   7.376389   7.778889         NA   7.467222   7.189722
## 3   6.920556   6.802778   7.478611   6.921667         NA   7.249167   6.305833
## 4   7.178889   6.705000   7.011389   7.145833         NA   7.254722   7.545278
## 5   7.815278   8.223056   8.290833   7.310278         NA   7.928889   7.928611
## 6  10.521389  10.992500  11.111111  10.829167         NA         NA  10.883611
##   2015-01-29 2015-01-30 2015-02-02 2015-02-03 2015-02-04 2015-02-05 2015-02-06
## 1   7.413611   6.849722   6.901667   7.203056   7.605278   7.565278   7.470000
## 2   7.259444   7.059167   7.631111   7.632500   7.644167   7.637222   7.900833
## 3   7.221944   7.024167   7.071111   6.520278   7.364167   6.358333   6.906667
## 4   6.909167   7.219722   7.067778   7.436111   6.977222   7.151389   7.145278
## 5   8.079167   8.028611   8.165000   8.028889   7.996667   8.090556   7.708611
## 6  11.044444  10.577500  10.555000         NA  10.856667  11.012500  10.803056
##   2015-02-09 2015-02-10 2015-02-11 2015-02-12 2015-02-13 2015-02-16 2015-02-17
## 1   7.601389   7.267222   7.193056   7.435278   7.205278   7.605833   7.416111
## 2         NA   8.034444   7.637778   7.964444   7.733333   7.699167   7.406944
## 3   6.659722   6.825278   6.785833   6.455833   7.144722   7.022778   6.685278
## 4   7.333889         NA   7.060833   7.341111   7.145556   7.350278   7.110556
## 5   8.186667   7.678889   7.963611   7.670833   8.720556   7.922222   7.902500
## 6  10.911667  11.298889  11.338889  10.548611  10.965000  10.961111  11.320278
##   2015-02-18 2015-02-19 2015-02-20 2015-02-23 2015-02-24 2015-02-25 2015-02-26
## 1   7.839722         NA   7.832222   7.441111   7.225000   7.583611   7.200833
## 2   7.903889   7.912778   7.424167   8.255556   7.800000   7.752222   7.846667
## 3   7.046944   6.960278   6.886389   7.754722   6.751667   6.940278   7.553056
## 4   6.872500   7.523611   6.697778   7.179722   7.064167   6.280556   7.358056
## 5   8.094167   7.846111   8.230833   7.802778   8.118889   8.032222   7.709722
## 6  10.732500  10.705000  10.732500  10.641667  11.171111  10.961667  10.946944
##   2015-02-27 2015-03-02 2015-03-03 2015-03-04 2015-03-05 2015-03-06 2015-03-09
## 1   7.385833   7.156944   6.759722   7.744444         NA   7.815000   7.408611
## 2   8.120278   7.681667   7.693611   7.927500         NA   7.339722   8.130000
## 3   6.855000   7.493056   7.054444   7.188611         NA   6.771111   6.846944
## 4   7.263889         NA   7.403333   6.463611         NA         NA   7.157222
## 5   7.967222   8.693056   7.950278   8.072778         NA   8.182778   8.356389
## 6  10.714444  10.339444  11.358889  10.933889         NA  10.573333  10.835556
##   2015-03-10 2015-03-11 2015-03-12 2015-03-13 2015-03-16 2015-03-17 2015-03-18
## 1   6.923056   7.161111   7.080000   7.310278         NA   6.915000   7.197778
## 2   7.690556   7.986667   7.522500   7.819722   7.205278   7.365000   7.267500
## 3   6.873056   7.079722   7.275556   6.821389   6.760556   7.120833   7.316667
## 4   7.397222   7.493333   7.624167   6.817222   6.724444   7.208333   7.109722
## 5   8.064722   8.143611   8.083611   8.051944   7.789722   7.950278   8.106111
## 6  10.422778  10.864167  10.755000  10.253333  11.081667  10.780833  10.656111
##   2015-03-19 2015-03-20 2015-03-23 2015-03-24 2015-03-25 2015-03-26 2015-03-27
## 1   7.847778   7.162500   7.616389   7.251944   7.445556         NA   7.555833
## 2   7.697500   7.430278   7.948056         NA   7.320833   7.925556   7.774167
## 3   6.545556         NA   6.767500   6.900278   6.981944   7.418333         NA
## 4   7.006389   7.206389   6.947778   6.926667   7.346111   7.812778         NA
## 5   7.740000   7.769444   7.674722   7.675278   7.841667   7.784722   7.898333
## 6  10.462222  10.928889  10.791389  10.821111  11.006389  10.972778  10.854444
##   2015-03-30 2015-03-31 2015-04-01 2015-04-02 2015-04-03 2015-04-06 2015-04-07
## 1   7.356944   7.865000   7.336944   7.658611   7.187778   7.040833   7.640000
## 2   7.401944   7.972778   7.728333   7.753889   7.617500   7.371389   7.473333
## 3   6.714444   7.478333   6.911944   6.833333   6.748333   6.905278   7.235000
## 4   7.042222   7.613333   7.509722   6.913889   7.069444   7.398056   7.055556
## 5   7.699167   7.803889   8.037500   8.101667   7.827778   8.355556   8.037500
## 6  11.045556  11.513611  10.860278  10.090278  11.047222  10.895833  10.772222
##   2015-04-08 2015-04-09 2015-04-10 2015-04-13 2015-04-14 2015-04-15 2015-04-16
## 1   7.427222   7.803056   7.115833   7.348056   7.146667   7.459722   7.756944
## 2   7.383056   7.087778   7.921111   8.192778   8.174167   7.475000   7.238333
## 3   6.960278   7.461667   6.755278   7.265556   7.233056   7.270833   7.060556
## 4   7.219444   7.050833   6.971944   6.834167         NA   6.495556   7.079167
## 5   7.581944   8.145556   7.946667   7.871389   8.528056   8.073333   8.168333
## 6  10.592222  11.588056  11.322500  11.540556  10.964167  11.267222  10.613889
##   2015-04-17 2015-04-20 2015-04-21 2015-04-22 2015-04-23 2015-04-24 2015-04-27
## 1   7.284722   7.695278   6.975556   7.525000   7.336389   7.562500   7.241944
## 2   7.738889   7.784167   8.466667   8.003611   7.958056   7.532500   7.639167
## 3   7.611667   7.449444   7.140833   7.446111         NA   6.914444   7.002778
## 4   6.984444   7.305833   7.077778   7.257222   7.636944   7.443611   7.374167
## 5   7.878889   8.321111         NA   7.492500   7.830278   7.842222   8.670556
## 6  11.086111  11.164722  11.021389  10.871111  10.630833  10.824444  10.830278
##   2015-04-28 2015-04-29 2015-04-30 2015-05-01 2015-05-04 2015-05-05 2015-05-06
## 1   7.889167   7.691667   7.508056         NA   7.410556   7.308611   7.456667
## 2   7.606667         NA   8.081944         NA   7.793056   7.718056   7.858056
## 3   7.343056   6.963611   7.050556         NA   6.787500   6.330278   7.111667
## 4   6.926944   7.144167   7.120833         NA   7.503889   6.727222   7.843889
## 5   7.493056   7.647778   7.785000         NA   7.910833   8.336667   8.553889
## 6  10.613611  10.101667  11.000278         NA   9.971944  11.514722  10.658056
##   2015-05-07 2015-05-08 2015-05-11 2015-05-12 2015-05-13 2015-05-14 2015-05-15
## 1   7.062778   7.615000         NA   7.444722   7.502500   7.901667   6.816944
## 2   7.608333   7.179167   7.526111   7.917222   7.480278         NA   7.397500
## 3   6.872500   6.871389   7.032500   7.101944   6.831111   7.276111   6.610278
## 4   7.156389   7.714444   7.134722   7.274444   7.174722   7.263333   7.146389
## 5   7.362500   7.673889   7.791111   8.631111   7.473611   7.964167   7.811667
## 6  10.495556  10.094167         NA  11.448889  10.783056  11.020000  10.526944
##   2015-05-18 2015-05-19 2015-05-20 2015-05-21 2015-05-22 2015-05-25 2015-05-26
## 1         NA   7.333056   7.319444   7.465278   7.128889   7.573056   7.199722
## 2         NA   8.082222   7.204167   7.714722   8.022778   8.079722   8.119167
## 3   7.237778   7.414167   7.436667   7.116111   6.832778   7.280000   6.265833
## 4   6.886944   7.508611   6.623056   7.210278         NA   7.212500   7.373056
## 5   7.521389   8.317222   7.938889   7.855000   8.155000   8.644722   8.501111
## 6  10.592222  11.003333  10.936944  10.742778  10.750833  10.698333         NA
##   2015-05-27 2015-05-28 2015-05-29 2015-06-01 2015-06-02 2015-06-03 2015-06-04
## 1   7.877778   7.447778         NA         NA   7.717222   7.514167   7.041944
## 2   7.968056   7.277778   8.052778   8.107222   7.616389   7.793611   6.920833
## 3   7.151389   6.307222   6.743611   6.788889   7.177500   7.313611   6.813889
## 4   7.582500   7.667778   7.114444   6.965556   7.801667   7.316667   7.451389
## 5   8.113333   7.747778   8.319167   8.061944   7.799167   8.192222   7.627222
## 6  11.175278  10.688889  10.705000  10.790278  10.358889  11.225833  10.940833
##   2015-06-05 2015-06-08 2015-06-09 2015-06-10 2015-06-11 2015-06-12 2015-06-15
## 1         NA   7.072778         NA   7.075278   7.575000   7.421389   7.595278
## 2   7.664167   7.583611   7.429722   8.091944   7.980278         NA   8.050278
## 3   6.684444   7.640556   6.933889   7.336389   7.211111   7.119722   7.097222
## 4   6.787222   7.283056   7.201944   7.254444   7.433333   7.121111   7.332778
## 5   8.221667   8.157500   7.806944   7.814167         NA   8.292778   8.129722
## 6  10.305833  10.858056  11.035000  10.781389  10.803611  11.107222  10.643333
##   2015-06-16 2015-06-17 2015-06-18 2015-06-19 2015-06-22 2015-06-23 2015-06-24
## 1   7.542222   7.526111   7.521389   7.248889   7.178333   6.912778   7.112778
## 2   7.766944   8.225556   7.883611   7.706389   8.000833   7.793889   7.528611
## 3   6.915278   6.994167   7.258889   6.940278   6.616667   7.142778         NA
## 4   6.905278   7.268056   7.751944   7.298333   7.568889   6.837778   7.466111
## 5   8.081667   8.349722   8.486111   7.611389   8.705556   7.821111   7.549722
## 6  10.984444  10.983611  10.922778  10.819722  11.068611  10.843889  10.275833
##   2015-06-25 2015-06-26 2015-06-29 2015-06-30 2015-07-01 2015-07-02 2015-07-03
## 1   7.753611   7.052222   7.661667   7.303056   7.723611   7.618333   7.158889
## 2   7.945833   7.679722   7.968333   7.111389   7.700278   7.807778   8.143889
## 3   6.570833   7.057222   6.770556   6.589444   6.716389   6.851944   6.928611
## 4   6.956389   7.136944   7.152778   7.199444   6.865000   7.371389   7.507500
## 5   7.900833   8.241389   7.941111   8.116944   7.708611   7.978333   7.599444
## 6  10.806389  10.813333   9.874444         NA         NA         NA  10.939722
##   2015-07-06 2015-07-07 2015-07-08 2015-07-09 2015-07-10 2015-07-13 2015-07-14
## 1   7.947500   7.651389   7.492500   6.936944   7.368611   7.276389   7.127778
## 2   7.719444   7.848611   8.022778   7.107500   7.703611         NA   7.511111
## 3   7.966389   6.763056   6.859167   7.993611   7.049167   6.348889   7.394444
## 4   7.376667   7.554444   6.947222   7.275000   7.613889   7.377778   7.753889
## 5   7.500556   7.925556   7.816944   7.933889   8.256389   7.933889   7.827500
## 6  10.678333  10.785833  10.815000  10.886667  10.595278  10.921389  10.794167
##   2015-07-15 2015-07-16 2015-07-17 2015-07-20 2015-07-21 2015-07-22 2015-07-23
## 1   7.624444   7.273889         NA   7.087222   7.253889   7.878889   6.681667
## 2   7.642500   8.243056         NA   7.483056   7.643889   7.806944   7.933889
## 3   7.206111   6.935556         NA   6.920833   7.205833   6.644444   6.942778
## 4   7.482500   7.246389         NA   6.864722   6.827222         NA   7.406667
## 5   8.157778   7.786944         NA   7.679722   8.323889   7.710556   7.768889
## 6  10.084444  10.641667         NA  10.871111  10.828056  10.680000         NA
##   2015-07-24 2015-07-27 2015-07-28 2015-07-29 2015-07-30 2015-07-31 2015-08-03
## 1   7.480556   7.473333   6.705000   7.148611   7.232222   7.380556   6.949444
## 2   7.861944         NA   7.871389   7.937778   7.377778   7.253056   7.775000
## 3   6.858056   7.115278   7.728056   7.393611   6.978889   7.211389   7.045556
## 4   7.393333   7.243333   7.011667   7.852500   7.231944   7.274444   7.653333
## 5   8.127778   8.406667   7.631667   7.670556   8.376667   8.009167   7.747222
## 6  10.104167  10.835556  11.085833  10.778611  10.651111  10.392778  10.908611
##   2015-08-04 2015-08-05 2015-08-06 2015-08-07 2015-08-10 2015-08-11 2015-08-12
## 1   7.178889   7.674722   7.505556   7.424722   7.603889   7.525556   7.610833
## 2   7.854167   7.812778   8.544722   7.925278   7.775000   7.255556   7.762778
## 3   6.802222         NA   6.884444   6.912222   7.247500   6.776389   7.043333
## 4   7.283889         NA   7.049722   7.400000   7.438056   7.655556   7.175556
## 5   8.297500   7.867500   7.604722   7.801111   7.874444   8.356111         NA
## 6  10.787778  10.731667  10.765833  11.457222  10.838611  10.496667  10.363611
##   2015-08-13 2015-08-14 2015-08-17 2015-08-18 2015-08-19 2015-08-20 2015-08-21
## 1   7.368889   7.141667   7.055556   7.628333   6.920833   7.219167   7.730833
## 2   8.066389   7.845556   8.197222   7.487500   7.893611   7.662500   7.526944
## 3   6.749167   7.048333   6.897778   7.293056   7.033333   6.747778   7.011389
## 4   6.774722   7.415556   7.147778   7.672222   7.311667   7.275278   6.842778
## 5   8.381111   8.078056   8.205833   7.552500   8.276944   7.889444   8.049167
## 6  10.543056  10.809167  10.833889  10.802500  10.704444  11.118056  10.690000
##   2015-08-24 2015-08-25 2015-08-26 2015-08-27 2015-08-28 2015-08-31 2015-09-01
## 1   7.488611   7.416667   7.703611   7.688889   7.554722   7.003333         NA
## 2   7.476111   7.477778   7.490278   7.893056   7.567778   7.836667   6.908056
## 3   6.945556   6.703611   6.656111   6.946944   6.650278   6.740278   7.345000
## 4         NA   7.174444   7.088889   6.673056         NA   6.984167   7.473889
## 5   8.248611   8.426667   8.135000   7.775278   7.693611   7.479444   7.674444
## 6  11.149722  11.010556  10.500833  10.540556  10.608889  10.841389  11.116667
##   2015-09-02 2015-09-03 2015-09-04 2015-09-07 2015-09-08 2015-09-09 2015-09-10
## 1   7.187222   7.433611   7.529444   7.383333   7.344167   7.530000   7.376111
## 2   7.498889   7.909167   7.812500   8.193333         NA   7.578611   7.891389
## 3   7.236389   7.146111   7.080833   6.825556   7.141944   6.799722   7.541389
## 4   6.990278   7.294167   7.092500   7.042500   7.264167   7.348056   7.286111
## 5   8.348333   8.828333   8.112222   7.809722   7.931111   8.090000   8.403333
## 6  10.938889  11.167778  10.677222  10.626111  10.590278  10.833333  10.875278
##   2015-09-11 2015-09-14 2015-09-15 2015-09-16 2015-09-17 2015-09-18 2015-09-21
## 1   7.334167   6.868333   7.133889   7.333333         NA   7.452222   7.430556
## 2   7.484722   7.457778   7.414444   7.835556         NA   7.404444   7.703611
## 3   6.591944   7.362500   6.785000   6.935833         NA   6.983333   7.217222
## 4   7.258611   7.890833   6.807778   7.084167         NA   7.516389   7.115556
## 5   7.760556   8.151944   7.786389   7.784722         NA   7.585833   7.846111
## 6         NA         NA  10.601389  10.642222         NA  10.629722  10.520833
##   2015-09-22 2015-09-23 2015-09-24 2015-09-25 2015-09-28 2015-09-29 2015-09-30
## 1   7.653889   7.043889   7.320556   7.732222   7.644444   7.592222   7.367222
## 2   7.265000   7.701944   8.083889   8.139444   7.696111   8.029722   7.871667
## 3   6.883611   7.164167   7.035000   6.977500   7.811944   7.439444   6.759722
## 4   6.987500   7.070000   7.589722         NA   7.010833   6.915278   6.854167
## 5   7.936389   8.302778   8.214444   8.559444   8.186944   8.276944   7.825278
## 6  11.008889  11.003889  10.698611  10.634722  11.098056  11.173611  11.058889
##   2015-10-01 2015-10-02 2015-10-05 2015-10-06 2015-10-07 2015-10-08 2015-10-09
## 1   7.787778         NA   7.028056   7.680833   7.398611   7.488056   7.605556
## 2   7.423333         NA   8.099444   7.060000   7.949167   8.250833   7.631111
## 3   7.292500         NA   7.019722   7.517778   7.162222   6.931389   6.916111
## 4   6.893889         NA         NA   7.171667   7.500278   7.498056   6.870000
## 5   7.650833         NA   8.505833   7.803056   7.663056   7.987500   7.668611
## 6  10.725833         NA  10.503056  10.499444  11.241667  10.534444  10.392222
##   2015-10-12 2015-10-13 2015-10-14 2015-10-15 2015-10-16 2015-10-19 2015-10-20
## 1   7.160556         NA   7.488611   7.572778   6.821944   7.547222   7.038333
## 2   7.089167   8.302500   7.571111         NA   7.631111   7.941111   7.571667
## 3   7.086111   6.692222   7.083611   6.791389   7.015000   6.554444   6.970833
## 4   7.283889   7.112778   6.915833   7.170833   6.800833   7.098889   7.488333
## 5   8.666667   7.949722   7.885556   8.123333   7.808889   7.615833   8.500833
## 6  10.452222  11.743889  10.311111   9.815278  10.243889  10.991944  10.487500
##   2015-10-21 2015-10-22 2015-10-23 2015-10-26 2015-10-27 2015-10-28 2015-10-29
## 1         NA         NA   7.170278   7.746111   7.050833   7.703333   7.118611
## 2   7.078056   8.558889   6.725278   8.149444   7.910000   7.547222   7.868333
## 3   7.002500   7.023333   6.868611   7.248611   7.526944   6.460556   6.803889
## 4   7.048889   6.793611   7.508889   7.433333   6.893056   7.063611   7.059167
## 5   8.316111   8.319722   8.100556   8.264167   8.315833   7.985833   8.081111
## 6  10.817222  11.154722  10.815278  11.028333  11.034167  10.645833  10.972222
##   2015-10-30 2015-11-02 2015-11-03 2015-11-04 2015-11-05 2015-11-06 2015-11-09
## 1   7.313056   6.862500   7.594722   6.944444   7.638333   6.995833         NA
## 2   7.753056   7.408889   7.706667   7.932778   7.363611   7.433333         NA
## 3   6.629722   7.690278   7.042222   6.703333   6.687500   6.555556         NA
## 4         NA   7.076389   7.244444   6.825556   7.513056         NA         NA
## 5   7.587500   8.409722   7.522778   8.116667   8.128056   8.208889         NA
## 6  10.715833  10.730833  11.188889  10.893333  11.070000  10.202500         NA
##   2015-11-10 2015-11-11 2015-11-12 2015-11-13 2015-11-16 2015-11-17 2015-11-18
## 1         NA         NA   7.178611   7.327222   7.785000   7.095278   7.410556
## 2         NA         NA   7.654444   8.204167   8.250833   8.001389   7.928889
## 3         NA         NA   7.365833   6.995833   6.801944   7.293056   7.269167
## 4         NA         NA   7.605278   6.735833   7.016389   6.591667   7.284444
## 5         NA         NA   8.348611   7.865278   7.953611   7.905556   7.936111
## 6         NA         NA  10.791389  10.323333  10.565000  10.683611  10.765278
##   2015-11-19 2015-11-20 2015-11-23 2015-11-24 2015-11-25 2015-11-26 2015-11-27
## 1   6.994167   7.818056   7.318056   7.587222   7.416667   7.434444   7.086944
## 2         NA   7.860833   7.692222   7.698889   7.520000   7.903889   7.954722
## 3   6.198333   7.560833   6.943889   6.664167   7.063611   7.457222         NA
## 4   7.596111   7.538611   7.132778   7.808889   7.133611   6.899167   6.950556
## 5   7.603333   8.563889   8.445278   8.543056   7.890556   7.627778   8.122222
## 6  10.932222  10.420833  10.588056  10.492222  10.784722  10.779722  10.809444
##   2015-11-30 2015-12-01 2015-12-02 2015-12-03 2015-12-04 2015-12-07 2015-12-08
## 1   7.013056   7.365000   7.575556   7.584444   7.120278   6.889444   7.940833
## 2   7.531667   7.412500   8.117500   8.039722   7.982778   7.508889   7.715833
## 3   6.977778   6.936944   6.737500   6.766944   7.068611   6.798611   7.490278
## 4   7.310000   7.278611   7.068611   7.212778   6.912222   7.404722   7.281944
## 5   8.436944   7.911111   7.400000   7.666111   7.868333   8.103889   8.535000
## 6  10.546944  11.018333  10.418889  10.910556  10.671389  10.816111  10.433611
##   2015-12-09 2015-12-10 2015-12-11 2015-12-14 2015-12-15 2015-12-16 2015-12-17
## 1   7.948333   7.196944   7.651944   7.538889         NA   7.551944         NA
## 2   7.973333   7.821667   7.458056   7.536111   7.435278   8.255278   7.928333
## 3   7.535000         NA   7.199444   7.096389   6.751111   6.902778   7.218333
## 4   6.877500   7.513056   6.966667   7.055833   6.999444   7.313056   7.455833
## 5   7.854167   7.921111   8.695833   7.589444   8.193056   8.129444   8.320000
## 6  10.858333  10.562500  11.368889         NA  10.821389  10.720000  10.731667
##   2015-12-18 2015-12-21 2015-12-22 2015-12-23 2015-12-24 2015-12-25 2015-12-28
## 1         NA   7.339167   7.395833   6.504722   7.596389         NA   7.773889
## 2   7.903056   7.753889   7.712222   7.435556         NA         NA   7.614722
## 3   6.785833   7.163611   6.801667   6.730278   6.849722         NA   7.023889
## 4   7.629167   6.846667   7.326389   7.413611   7.085000         NA   7.447222
## 5   7.903611   7.665000   7.957500   7.786944   8.249444         NA   7.662222
## 6  10.564444  11.067222  11.145833  10.960833  10.467778         NA         NA
##   2015-12-29 2015-12-30 2015-12-31
## 1   7.315000   7.778889   7.080278
## 2   7.982500   7.986111   8.227222
## 3   7.438889   7.538889   6.786389
## 4   7.416667   7.366389   7.133056
## 5   8.268611   7.953333   8.018056
## 6  10.893611  10.897222  10.838333
#now, all columns are numeric. 
#now calculate the average work hours per day for each employee in 2015
employee_avg_work_hours <- employee_work_hours %>%
  mutate(avg_work_hours = rowMeans(select(., -EmployeeID), na.rm = TRUE)) %>%
  select(EmployeeID, avg_work_hours)
#there's no missing observation for 'employee_avg_work_hours' dataframe according to the missing map plot.
missmap(employee_avg_work_hours, main = 'missing maps for employee_avg_work_hours data', col=c('yellow', 'black'), legend = TRUE)

print(head(employee_avg_work_hours))
##   EmployeeID avg_work_hours
## 1          1       7.373651
## 2          2       7.718969
## 3          3       7.013240
## 4          4       7.193678
## 5          5       8.006175
## 6          6      10.796096

2.1.5.5 a.5.5.Merging all the tables into one table (data frame) for model building

+Given that every data frame now share a common column, ‘EmployeeID’, I merge these data frames (except for the in_time & out_time data frames) to create a statistical model that identifies the factors that predict employee attrition. The new data frame’s name is ‘employee_data_original’.

+The missing observations map of ‘employee_data_original’ data frame suggests that the percentage of missing observations is closer to 0% (only about 13 out of 4410). Given that they are unlikely to significantly influence the results of my analysis and model specification, I omit them later in my analyses and model specification.

library(purrr)

df_list <- list(employee_survey_data, employee_avg_work_hours, general_data, 
                manager_survey_data)

employee_data_original <- df_list %>% reduce(full_join, by='EmployeeID')
print(head(employee_data_original))
## # A tibble: 6 × 30
##   EmployeeID EnvironmentSatisfaction JobSatisfaction WorkLifeBalance
##        <dbl>                   <dbl>           <dbl>           <dbl>
## 1          1                       3               4               2
## 2          2                       3               2               4
## 3          3                       2               2               1
## 4          4                       4               4               3
## 5          5                       4               1               3
## 6          6                       3               2               2
## # ℹ 26 more variables: avg_work_hours <dbl>, Age <dbl>, Attrition <chr>,
## #   BusinessTravel <chr>, Department <chr>, DistanceFromHome <dbl>,
## #   Education <dbl>, EducationField <chr>, EmployeeCount <dbl>, Gender <chr>,
## #   JobLevel <dbl>, JobRole <chr>, MaritalStatus <chr>, MonthlyIncome <dbl>,
## #   NumCompaniesWorked <dbl>, Over18 <chr>, PercentSalaryHike <dbl>,
## #   StandardHours <dbl>, StockOptionLevel <dbl>, TotalWorkingYears <dbl>,
## #   TrainingTimesLastYear <dbl>, YearsAtCompany <dbl>, …
missmap(employee_data_original, main = 'missing maps', col=c('yellow', 'black'), legend = TRUE)

2.2 b. Exploratory Data Analysis

To get better ideas about the employees in this data, I visually explore each variable.

2.2.1 b.1.Writing functions for repeated codes for ggplots

library(ggplot2)

#given that most of codes for these plots will be repetitive, I now create a function for ggplot for a categorical variable.

plot_bar_c <- function(data, x_var) {
  data <- na.omit(data)
  ggplot(data, aes_string(x = x_var, fill = x_var)) + 
    geom_bar() +
    labs(x = x_var, title = paste("Overview of Employee", x_var)) + 
    theme_minimal() +
    theme(
      axis.text = element_text(size = 10),
      axis.title = element_text(size = 12),
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5)
    )
}

#a function for ggplot for a categorical variable with more than 6 categories 

plot_bar_c1 <- function(data, x_var) {
  data <- na.omit(data)
  ggplot(data, aes_string(x = x_var, fill = x_var)) + 
    geom_bar() +
    labs(x = x_var, title = paste("Overview of Employee", x_var)) + 
    theme_minimal() +
    theme(
      axis.text.x = element_text(size = 9, angle = 45, hjust = 1),
      axis.title = element_text(size = 12),
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5)
    )
}

#a function for ggpplot for a numeric variable
plot_bar_n <- function(data, x_var) {
  data <- na.omit(data) 
  ggplot(data, aes_string(x = x_var)) + 
    geom_bar(alpha=0.5, fill = 'red') +
    labs(x = x_var, y = "Count", title = paste("Overview of Employee", x_var)) + 
    theme_minimal() +
    theme(
      axis.text = element_text(size = 10),
      axis.title = element_text(size = 12),
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
      panel.grid.major = element_blank(),  
      panel.grid.minor = element_blank(),  
      panel.border = element_blank(),      
    )
}

2.2.2 b.2.Plots showing the distribution of each variable

#attrition (factor) 
plot_bar_c(employee_data_original, "Attrition") +
  labs(y = "Number of Employees")

#avg_work_hours (count)
ggplot(employee_data_original, aes(x = avg_work_hours)) +
  geom_histogram(alpha = 0.5, fill = "red", color = "white") +
  labs(title = "Overview of employee average work hours per day", 
       x = "Average length of work hours per day", y = "Number of Employees")+
  theme_minimal() +
    theme(
      axis.text = element_text(size = 12),
      axis.title = element_text(size = 12),
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
      panel.grid.major = element_blank(),  
      panel.grid.minor = element_blank(),  
      panel.border = element_blank(),      
    )

#EnvironmentaSatisfaction (number)
plot_bar_n(employee_data_original, "EnvironmentSatisfaction") + 
  labs(title = "Overview of WorkPlace Environment Satisfaction", x = "Employee's perception of Work Environment Satisfaction", y = "Number of Employees",  caption = "Workplace Environment Satisfaction refers to \n employee's perception of how satisfied they are with their current workplace environment.\n  1 refers to 'Low', 2 refers to 'Medium', 3 refers to 'High', and 4 refers to 'Very High'.") +
  theme(
    plot.caption = element_text(hjust = 0.5, color = "#333333", margin = margin(t = 20)))

#JobSatisfaction (number)
plot_bar_n(employee_data_original, "JobSatisfaction") +
  labs(title = "Overview of Job Satisfaction", x = "Job Satisfaction", y = "Number of Employees",  caption = "1 refers employee perceiving their job satisfaction as 'Low', 2 refers to 'Medium', 3 refers to 'High', and 4 refers to 'Very High'.") +
  theme(
    plot.caption = element_text(hjust = 0.5, color = "#333333", margin = margin(t = 20)))

#WorkLifeBalance (number)
plot_bar_n(employee_data_original, "WorkLifeBalance") +
  labs(title = "Overview of Work Life Balance", x = "Work Life Balance", y = "Number of Employees",  caption = "1 refers to employee perceiving their work-life-balance as 'Bad', 2 refers to 'Good', 3 refers to 'Better', and 4 refers to 'Best'.") +
  theme(
    plot.caption = element_text(hjust = 0.5, color = "#333333", margin = margin(t = 20)))

#Age
plot_bar_n(employee_data_original, "Age") + labs(y = "Number of Employees")

#BusinessTravel (factor)
plot_bar_c(employee_data_original, "BusinessTravel") + 
  labs(title = "Overview of Employee Business Travel", y = "Number of Employees", x = "Business Travel")

#Department (factor)
plot_bar_c(employee_data_original, "Department") +
  labs(title = "Overview of Employee Distribution Across Departments", y = "Number of Employees")

#DistanceFromHome
plot_bar_n(employee_data_original, "DistanceFromHome") +
  labs(title = "Overview of Employee Commute Distance in Km", y = "Number of Employees", x= "Distance From Home")

#Education
plot_bar_n(employee_data_original, "Education") +
  labs(title = "Overview of Employee Education Level", y = "Number of Employees", caption = "1 = 'Below College', 2 = 'College', 3 ='Bachelor', 4 = 'Master's', 5 = 'Doctorate'")+
  theme(
    plot.caption = element_text(hjust = 0.5, color = "#333333", margin = margin(t = 20)))

#EducationField (factor)
plot_bar_c1(employee_data_original, "EducationField") +
  labs(title = "Overview of Employee Education Field", y = "Number of Employees", x= "Education Field")

#Gender (factor)
plot_bar_c(employee_data_original, "Gender") +
  labs(y = "Number of Employees")

#JobLevel (this variable was collected as a numeric scale, not as a factor)

plot_bar_n(employee_data_original, "JobLevel") +
  labs(title = "Overview of Employee Distribution Across Job Levels", x = "Job Level", y = "Number of Employees", caption = "1 refers the lowest level, while 5 refers to the highest level") +
  theme(
    plot.caption = element_text(hjust = 0.5, color = "#333333", margin = margin(t = 20)))

#JobRole (factor)
plot_bar_c1(employee_data_original, "JobRole") +
  labs(title = "Overview of Employees Across JOb Roles", x = "Job Role", y = "Number of Employeees") # 9 factors. Given that it is less than double digits, we do not necessarily need to combine some categories. But I might consider doing so later if it becomes necessary to improve my model.

#MaritalStatus (factor)
plot_bar_c(employee_data_original, "MaritalStatus") +
  labs(title = "Overview of Employee Distribution  across Marital Status", x = "Marital Status", y = "Number of Employees")

#MonthlyIncome
ggplot(employee_data_original, aes(x = MonthlyIncome)) +
  geom_histogram(alpha = 0.5, fill = "red", color = "white") +
  labs(title = "Overview of Employee Monthly Income", x = "Monthly Income in Rupees", y = "Number of Employees") +
  theme_minimal() +
    theme(
      axis.text = element_text(size = 12),
      axis.title = element_text(size = 12),
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
      panel.grid.major = element_blank(),  
      panel.grid.minor = element_blank(),  
      panel.border = element_blank(),      
    )

#NumCompaniesWorked
plot_bar_n(employee_data_original, "NumCompaniesWorked") +
  labs(title = "Overview of Percent Salary Hike For Last Year", y = "Number of Employees", x = "Percent Salary Hike For Last Year")

#PercentSalaryHike
plot_bar_n(employee_data_original, "PercentSalaryHike") +
  labs(title = "Overview of Employee Companies Employees Worked", y = "Number of Employees", x = "Number of Companies Worked")

#StockOptionLevel
plot_bar_n(employee_data_original, "StockOptionLevel") +
  labs(title = "Overview of Employee Stock Option Level", y = "Number of Employees", x = "Employee Stock Option Level", caption = "1 meaning the lowest stock option level while 4 meaning the highest level") +
  theme(
    plot.caption = element_text(hjust = 0.5, color = "#333333", margin = margin(t = 20)))

#TotalWorkingYears
plot_bar_n(employee_data_original, "TotalWorkingYears") +
    labs(title = "Overview of Total Number of Years the Employee Has Worked", y = "Number of Employees", x = "Total Number of Working Years")

#TrainingTimesLastYear
plot_bar_n(employee_data_original, "TrainingTimesLastYear") +
    labs(title = "Overview of Number of Training Employees Received", y = "Number of Employees", x = "Number of Training")

#YearsAtCompany
plot_bar_n(employee_data_original, "YearsAtCompany")  +
    labs(title = "Overview of Number of Years Spent at The Company", y = "Number of Employees", x = "Number of Years Spent At the Company")

#YearsSinceLastPromotion
plot_bar_n(employee_data_original, "YearsSinceLastPromotion")  +
    labs(title = "Overview of Number of Years Since Last Promotion", y = "Number of Employees", x = "Number of Years Spent Since Last Promotion")

#YearsWithCurrManager
plot_bar_n(employee_data_original, "YearsWithCurrManager") + 
    labs(title = "Overview of Number of Years with Current manager", y = "Number of Employees", x = "Number of Years with Current manager")

#JobInvolvement
plot_bar_n(employee_data_original, "JobInvolvement") +
    labs(title = "Overview of Number of Years with Current manager", y = "Number of Employees", x = "Number of Years with Current manager")

#PerformanceRating
plot_bar_n(employee_data_original, "PerformanceRating") +
  labs(title = "Overview of Employee Perceived Job Involvement", x = "Job Involvement", y = "Number of Employees",  caption = "1 refers to employee perceiving their job involvement as 'Low', 2 refers to 'Medium', 3 refers to 'High', and 4 refers to 'Very High'.") +
  scale_x_continuous(breaks = 1:4, limits = c(1, 5)) +
  theme(
    plot.caption = element_text(hjust = 0.5, color = "#333333", margin = margin(t = 20)))

2.3 c.Further Data Preparation

2.3.1 c.1.Remove unncessary variables & changing data types for variables

As variables EmployeeCount, Over18, and StandardHours show no variance across observations, they are removed because they do not provide any insights into employee attrition. Additionally, EmployeeID is removed as it serves as a unique identifier for each employee, and the dataframes have already been merged using this variable.

I also change some variables’ data types to be appropriate for model building.

employee_data_original <- select(employee_data_original, -EmployeeID, -Over18, -StandardHours, -EmployeeCount)


#convert variables that should be factors into factors and that should be numeric variables into numeric
library(dplyr)
employee_data_original$Attrition <- factor(employee_data_original$Attrition)
employee_data_original$BusinessTravel <- factor(employee_data_original$BusinessTravel)
employee_data_original$Department <- factor(employee_data_original$Department)
employee_data_original$EducationField <- factor(employee_data_original$EducationField)
employee_data_original$Gender <- factor(employee_data_original$Gender)
employee_data_original$JobRole <- factor(employee_data_original$JobRole)
employee_data_original$MaritalStatus <- factor(employee_data_original$MaritalStatus)

employee_data_original_original <- employee_data_original #keep the dataset with original variables before label encoding to plot the data later.

#do a labels encoding as machines do not understand texts. For example, change one categorical variable into multiple binary variables.
#additionally, change the text labels to numbers (1,0)
head(employee_data_original$BusinessTravel)
## [1] Travel_Rarely     Travel_Frequently Travel_Frequently Non-Travel       
## [5] Travel_Rarely     Travel_Rarely    
## Levels: Non-Travel Travel_Frequently Travel_Rarely
employee_data_original$Attrition <- ifelse(employee_data_original$Attrition == "Yes", 1, 0)

employee_data_original$Business_travel_rarely <- ifelse(employee_data_original$BusinessTravel == "Travel_Rarely", 1, 0)
employee_data_original$Business_travel_frequently <- ifelse(employee_data_original$BusinessTravel == "Travel_Frequently", 1, 0)
employee_data_original$Business_travel_none <- ifelse(employee_data_original$BusinessTravel == "Non-Travel", 1, 0)

employee_data_original$Department_HR <- ifelse(employee_data_original$Department == "Human Resources", 1,0)
employee_data_original$Department_RnD <- ifelse(employee_data_original$Department == "Research & Development", 1,0)
employee_data_original$Department_Sales <- ifelse(employee_data_original$Department == "Sales", 1,0)

employee_data_original$EducationField_HR <- ifelse(employee_data_original$EducationField == "Human Resources", 1,0)
employee_data_original$EducationField_Life_Science <-   ifelse(employee_data_original$EducationField == "Life Sciences", 1,0)
employee_data_original$EducationField_Marketing <- ifelse(employee_data_original$EducationField == "Marketing", 1,0)
employee_data_original$EducationField_Medical <-   ifelse(employee_data_original$EducationField == "Medical", 1,0)
employee_data_original$EducationField_Other <- ifelse(employee_data_original$EducationField == "Other", 1,0)
employee_data_original$EducationField_Technical_Degree <-   ifelse(employee_data_original$EducationField == "Technical Degree", 1,0)

employee_data_original$Gender_Female <- ifelse(employee_data_original$Gender == "Female", 1,0)

employee_data_original$JobRole_healthcare_rep <- ifelse(employee_data_original$JobRole == "Healthcare Representative", 1,0)
employee_data_original$JobRole_Human_Resources <- ifelse(employee_data_original$JobRole == "Human Resources", 1,0)
employee_data_original$JobRole_Laboratory_Technician <- ifelse(employee_data_original$JobRole == "Laboratory Technician", 1,0)
employee_data_original$JobRole_Manager <- ifelse(employee_data_original$JobRole == "Manager", 1,0)
employee_data_original$JobRole_Manufacturing_Director <- ifelse(employee_data_original$JobRole == "Manufacturing Director", 1,0)
employee_data_original$JobRole_Research_Director <- ifelse(employee_data_original$JobRole == "Research Director", 1,0)
employee_data_original$JobRole_Research_Scientist <- ifelse(employee_data_original$JobRole == "Research Scientist", 1,0)
employee_data_original$JobRole_Sales_Executive <- ifelse(employee_data_original$JobRole == "Sales Executive", 1,0)
employee_data_original$JobRole_Sales_Representative <- ifelse(employee_data_original$JobRole == "Sales Representative", 1,0)

employee_data_original$divorced <- ifelse(employee_data_original$MaritalStatus == "Divorced", 1,0)
employee_data_original$married <- ifelse(employee_data_original$MaritalStatus == "Married", 1,0)
employee_data_original$single <- ifelse(employee_data_original$MaritalStatus == "Single", 1,0)

employee_data_original <- select(employee_data_original, -BusinessTravel, -Department, -EducationField, -Gender, -JobRole, -MaritalStatus)

#now every variable is numerical and machine learning ready
str(employee_data_original)
## tibble [4,410 × 45] (S3: tbl_df/tbl/data.frame)
##  $ EnvironmentSatisfaction        : num [1:4410] 3 3 2 4 4 3 1 1 2 2 ...
##  $ JobSatisfaction                : num [1:4410] 4 2 2 4 1 2 3 2 4 1 ...
##  $ WorkLifeBalance                : num [1:4410] 2 4 1 3 3 2 1 3 3 3 ...
##  $ avg_work_hours                 : num [1:4410] 7.37 7.72 7.01 7.19 8.01 ...
##  $ Age                            : num [1:4410] 51 31 32 38 32 46 28 29 31 25 ...
##  $ Attrition                      : num [1:4410] 0 1 0 0 0 0 1 0 0 0 ...
##  $ DistanceFromHome               : num [1:4410] 6 10 17 2 10 8 11 18 1 7 ...
##  $ Education                      : num [1:4410] 2 1 4 5 1 3 2 3 3 4 ...
##  $ JobLevel                       : num [1:4410] 1 1 4 3 1 4 2 2 3 4 ...
##  $ MonthlyIncome                  : num [1:4410] 131160 41890 193280 83210 23420 ...
##  $ NumCompaniesWorked             : num [1:4410] 1 0 1 3 4 3 2 2 0 1 ...
##  $ PercentSalaryHike              : num [1:4410] 11 23 15 11 12 13 20 22 21 13 ...
##  $ StockOptionLevel               : num [1:4410] 0 1 3 3 2 0 1 3 0 1 ...
##  $ TotalWorkingYears              : num [1:4410] 1 6 5 13 9 28 5 10 10 6 ...
##  $ TrainingTimesLastYear          : num [1:4410] 6 3 2 5 2 5 2 2 2 2 ...
##  $ YearsAtCompany                 : num [1:4410] 1 5 5 8 6 7 0 0 9 6 ...
##  $ YearsSinceLastPromotion        : num [1:4410] 0 1 0 7 0 7 0 0 7 1 ...
##  $ YearsWithCurrManager           : num [1:4410] 0 4 3 5 4 7 0 0 8 5 ...
##  $ JobInvolvement                 : num [1:4410] 3 2 3 2 3 3 3 3 3 3 ...
##  $ PerformanceRating              : num [1:4410] 3 4 3 3 3 3 4 4 4 3 ...
##  $ Business_travel_rarely         : num [1:4410] 1 0 0 0 1 1 1 1 1 0 ...
##  $ Business_travel_frequently     : num [1:4410] 0 1 1 0 0 0 0 0 0 0 ...
##  $ Business_travel_none           : num [1:4410] 0 0 0 1 0 0 0 0 0 1 ...
##  $ Department_HR                  : num [1:4410] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Department_RnD                 : num [1:4410] 0 1 1 1 1 1 1 1 1 1 ...
##  $ Department_Sales               : num [1:4410] 1 0 0 0 0 0 0 0 0 0 ...
##  $ EducationField_HR              : num [1:4410] 0 0 0 0 0 0 0 0 0 0 ...
##  $ EducationField_Life_Science    : num [1:4410] 1 1 0 1 0 1 0 1 1 0 ...
##  $ EducationField_Marketing       : num [1:4410] 0 0 0 0 0 0 0 0 0 0 ...
##  $ EducationField_Medical         : num [1:4410] 0 0 0 0 1 0 1 0 0 1 ...
##  $ EducationField_Other           : num [1:4410] 0 0 1 0 0 0 0 0 0 0 ...
##  $ EducationField_Technical_Degree: num [1:4410] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Gender_Female                  : num [1:4410] 1 1 0 0 0 1 0 0 0 1 ...
##  $ JobRole_healthcare_rep         : num [1:4410] 1 0 0 0 0 0 0 0 0 0 ...
##  $ JobRole_Human_Resources        : num [1:4410] 0 0 0 1 0 0 0 0 0 0 ...
##  $ JobRole_Laboratory_Technician  : num [1:4410] 0 0 0 0 0 0 0 0 1 1 ...
##  $ JobRole_Manager                : num [1:4410] 0 0 0 0 0 0 0 0 0 0 ...
##  $ JobRole_Manufacturing_Director : num [1:4410] 0 0 0 0 0 0 0 0 0 0 ...
##  $ JobRole_Research_Director      : num [1:4410] 0 0 0 0 0 1 0 0 0 0 ...
##  $ JobRole_Research_Scientist     : num [1:4410] 0 1 0 0 0 0 0 0 0 0 ...
##  $ JobRole_Sales_Executive        : num [1:4410] 0 0 1 0 1 0 1 1 0 0 ...
##  $ JobRole_Sales_Representative   : num [1:4410] 0 0 0 0 0 0 0 0 0 0 ...
##  $ divorced                       : num [1:4410] 0 0 0 0 0 0 0 0 0 1 ...
##  $ married                        : num [1:4410] 1 0 1 1 0 1 0 1 1 0 ...
##  $ single                         : num [1:4410] 0 1 0 0 1 0 1 0 0 0 ...

2.4 d.Model Building

2.4.1 d.1.Estimating the logistic regression model

Click to show code
## 
## Call:
## glm(formula = Attrition ~ ., family = binomial(link = "logit"), 
##     data = employee_data_original)
## 
## Coefficients: (5 not defined because of singularities)
##                                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                     -3.465e-01  7.427e-01  -0.467  0.64076    
## EnvironmentSatisfaction         -3.823e-01  4.344e-02  -8.802  < 2e-16 ***
## JobSatisfaction                 -3.569e-01  4.345e-02  -8.213  < 2e-16 ***
## WorkLifeBalance                 -3.492e-01  6.561e-02  -5.322 1.03e-07 ***
## avg_work_hours                   4.415e-01  3.417e-02  12.920  < 2e-16 ***
## Age                             -3.325e-02  7.546e-03  -4.407 1.05e-05 ***
## DistanceFromHome                -3.147e-03  6.018e-03  -0.523  0.60099    
## Education                       -5.998e-02  4.724e-02  -1.270  0.20423    
## JobLevel                        -6.360e-02  4.386e-02  -1.450  0.14709    
## MonthlyIncome                   -1.105e-06  1.040e-06  -1.063  0.28797    
## NumCompaniesWorked               1.481e-01  2.049e-02   7.229 4.85e-13 ***
## PercentSalaryHike                1.354e-02  2.061e-02   0.657  0.51125    
## StockOptionLevel                -7.781e-02  5.683e-02  -1.369  0.17098    
## TotalWorkingYears               -8.161e-02  1.365e-02  -5.980 2.24e-09 ***
## TrainingTimesLastYear           -1.555e-01  3.817e-02  -4.073 4.64e-05 ***
## YearsAtCompany                   3.801e-02  1.967e-02   1.933  0.05328 .  
## YearsSinceLastPromotion          1.643e-01  2.212e-02   7.430 1.09e-13 ***
## YearsWithCurrManager            -1.772e-01  2.478e-02  -7.152 8.57e-13 ***
## JobInvolvement                  -9.196e-02  6.597e-02  -1.394  0.16330    
## PerformanceRating               -5.222e-03  2.050e-01  -0.025  0.97968    
## Business_travel_rarely           6.237e-01  1.961e-01   3.181  0.00147 ** 
## Business_travel_frequently       1.413e+00  2.103e-01   6.717 1.85e-11 ***
## Business_travel_none                    NA         NA      NA       NA    
## Department_HR                    5.859e-01  2.982e-01   1.965  0.04945 *  
## Department_RnD                  -1.789e-03  1.263e-01  -0.014  0.98870    
## Department_Sales                        NA         NA      NA       NA    
## EducationField_HR                9.314e-01  4.195e-01   2.220  0.02639 *  
## EducationField_Life_Science      3.372e-01  1.913e-01   1.762  0.07799 .  
## EducationField_Marketing         8.311e-02  2.443e-01   0.340  0.73371    
## EducationField_Medical           2.653e-01  1.969e-01   1.347  0.17792    
## EducationField_Other            -9.667e-03  2.787e-01  -0.035  0.97233    
## EducationField_Technical_Degree         NA         NA      NA       NA    
## Gender_Female                   -6.657e-02  9.840e-02  -0.677  0.49870    
## JobRole_healthcare_rep           1.302e-01  2.580e-01   0.505  0.61377    
## JobRole_Human_Resources         -2.671e-03  3.324e-01  -0.008  0.99359    
## JobRole_Laboratory_Technician    2.527e-01  2.298e-01   1.099  0.27157    
## JobRole_Manager                 -2.042e-01  2.782e-01  -0.734  0.46290    
## JobRole_Manufacturing_Director  -4.100e-01  2.624e-01  -1.562  0.11820    
## JobRole_Research_Director        7.296e-01  2.699e-01   2.703  0.00687 ** 
## JobRole_Research_Scientist       3.272e-01  2.244e-01   1.458  0.14477    
## JobRole_Sales_Executive          4.435e-01  2.233e-01   1.986  0.04699 *  
## JobRole_Sales_Representative            NA         NA      NA       NA    
## divorced                        -1.177e+00  1.430e-01  -8.228  < 2e-16 ***
## married                         -8.852e-01  1.063e-01  -8.325  < 2e-16 ***
## single                                  NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3804.3  on 4299  degrees of freedom
## Residual deviance: 2946.7  on 4260  degrees of freedom
##   (110 observations deleted due to missingness)
## AIC: 3026.7
## 
## Number of Fisher Scoring iterations: 6

2.4.2 d.2.Improving the model by using the stepwise slection method

This method interactively removes a predictor variable from the model being estimated to attempt to delete predictors that do not significantly affect the fit of the model. The goal of this model is to find a model with the best fit by comparing different models’ AIC’s. It will compare different combinations. Lower AIC means a better model.

It is important to note that it is also recommended to run other tests for model & feature selection such as Chi-squared test & ANOVA test for cross-validation. However, given the time constraint, I skip this process.

This analysis suggests that the following variables significantly predict employee attrition:

avg_work_hours, EnvironmentSatisfaction, JobSatisfaction, MaritalStatus, NumCompaniesWorked, YearsSinceLAstPromotion, YearsWithCurrManager, BusinessTravel, TotalWorkingYears, WorkLifeBalance, TrainingTimesLastYear, Age, , BusinessTravel, Department.

Therefore, to reduce employee attrition most effectively, the company needs to make efforts to lower the avg_work_hours of their employees, particularly those who work more than 8 hours (the most important predictor).I will discuss recommendations for the senior leadership in section 2. of this document.

Click to show code
## Start:  AIC=3026.72
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + DistanceFromHome + Education + JobLevel + 
##     MonthlyIncome + NumCompaniesWorked + PercentSalaryHike + 
##     StockOptionLevel + TotalWorkingYears + TrainingTimesLastYear + 
##     YearsAtCompany + YearsSinceLastPromotion + YearsWithCurrManager + 
##     JobInvolvement + PerformanceRating + Business_travel_rarely + 
##     Business_travel_frequently + Business_travel_none + Department_HR + 
##     Department_RnD + Department_Sales + EducationField_HR + EducationField_Life_Science + 
##     EducationField_Marketing + EducationField_Medical + EducationField_Other + 
##     EducationField_Technical_Degree + Gender_Female + JobRole_healthcare_rep + 
##     JobRole_Human_Resources + JobRole_Laboratory_Technician + 
##     JobRole_Manager + JobRole_Manufacturing_Director + JobRole_Research_Director + 
##     JobRole_Research_Scientist + JobRole_Sales_Executive + JobRole_Sales_Representative + 
##     divorced + married + single
## 
## 
## Step:  AIC=3026.72
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + DistanceFromHome + Education + JobLevel + 
##     MonthlyIncome + NumCompaniesWorked + PercentSalaryHike + 
##     StockOptionLevel + TotalWorkingYears + TrainingTimesLastYear + 
##     YearsAtCompany + YearsSinceLastPromotion + YearsWithCurrManager + 
##     JobInvolvement + PerformanceRating + Business_travel_rarely + 
##     Business_travel_frequently + Business_travel_none + Department_HR + 
##     Department_RnD + Department_Sales + EducationField_HR + EducationField_Life_Science + 
##     EducationField_Marketing + EducationField_Medical + EducationField_Other + 
##     EducationField_Technical_Degree + Gender_Female + JobRole_healthcare_rep + 
##     JobRole_Human_Resources + JobRole_Laboratory_Technician + 
##     JobRole_Manager + JobRole_Manufacturing_Director + JobRole_Research_Director + 
##     JobRole_Research_Scientist + JobRole_Sales_Executive + JobRole_Sales_Representative + 
##     divorced + married
## 
## 
## Step:  AIC=3026.72
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + DistanceFromHome + Education + JobLevel + 
##     MonthlyIncome + NumCompaniesWorked + PercentSalaryHike + 
##     StockOptionLevel + TotalWorkingYears + TrainingTimesLastYear + 
##     YearsAtCompany + YearsSinceLastPromotion + YearsWithCurrManager + 
##     JobInvolvement + PerformanceRating + Business_travel_rarely + 
##     Business_travel_frequently + Business_travel_none + Department_HR + 
##     Department_RnD + Department_Sales + EducationField_HR + EducationField_Life_Science + 
##     EducationField_Marketing + EducationField_Medical + EducationField_Other + 
##     EducationField_Technical_Degree + Gender_Female + JobRole_healthcare_rep + 
##     JobRole_Human_Resources + JobRole_Laboratory_Technician + 
##     JobRole_Manager + JobRole_Manufacturing_Director + JobRole_Research_Director + 
##     JobRole_Research_Scientist + JobRole_Sales_Executive + divorced + 
##     married
## 
## 
## Step:  AIC=3026.72
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + DistanceFromHome + Education + JobLevel + 
##     MonthlyIncome + NumCompaniesWorked + PercentSalaryHike + 
##     StockOptionLevel + TotalWorkingYears + TrainingTimesLastYear + 
##     YearsAtCompany + YearsSinceLastPromotion + YearsWithCurrManager + 
##     JobInvolvement + PerformanceRating + Business_travel_rarely + 
##     Business_travel_frequently + Business_travel_none + Department_HR + 
##     Department_RnD + Department_Sales + EducationField_HR + EducationField_Life_Science + 
##     EducationField_Marketing + EducationField_Medical + EducationField_Other + 
##     Gender_Female + JobRole_healthcare_rep + JobRole_Human_Resources + 
##     JobRole_Laboratory_Technician + JobRole_Manager + JobRole_Manufacturing_Director + 
##     JobRole_Research_Director + JobRole_Research_Scientist + 
##     JobRole_Sales_Executive + divorced + married
## 
## 
## Step:  AIC=3026.72
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + DistanceFromHome + Education + JobLevel + 
##     MonthlyIncome + NumCompaniesWorked + PercentSalaryHike + 
##     StockOptionLevel + TotalWorkingYears + TrainingTimesLastYear + 
##     YearsAtCompany + YearsSinceLastPromotion + YearsWithCurrManager + 
##     JobInvolvement + PerformanceRating + Business_travel_rarely + 
##     Business_travel_frequently + Business_travel_none + Department_HR + 
##     Department_RnD + EducationField_HR + EducationField_Life_Science + 
##     EducationField_Marketing + EducationField_Medical + EducationField_Other + 
##     Gender_Female + JobRole_healthcare_rep + JobRole_Human_Resources + 
##     JobRole_Laboratory_Technician + JobRole_Manager + JobRole_Manufacturing_Director + 
##     JobRole_Research_Director + JobRole_Research_Scientist + 
##     JobRole_Sales_Executive + divorced + married
## 
## 
## Step:  AIC=3026.72
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + DistanceFromHome + Education + JobLevel + 
##     MonthlyIncome + NumCompaniesWorked + PercentSalaryHike + 
##     StockOptionLevel + TotalWorkingYears + TrainingTimesLastYear + 
##     YearsAtCompany + YearsSinceLastPromotion + YearsWithCurrManager + 
##     JobInvolvement + PerformanceRating + Business_travel_rarely + 
##     Business_travel_frequently + Department_HR + Department_RnD + 
##     EducationField_HR + EducationField_Life_Science + EducationField_Marketing + 
##     EducationField_Medical + EducationField_Other + Gender_Female + 
##     JobRole_healthcare_rep + JobRole_Human_Resources + JobRole_Laboratory_Technician + 
##     JobRole_Manager + JobRole_Manufacturing_Director + JobRole_Research_Director + 
##     JobRole_Research_Scientist + JobRole_Sales_Executive + divorced + 
##     married
## 
##                                  Df Deviance    AIC
## - JobRole_Human_Resources         1   2946.7 3024.7
## - Department_RnD                  1   2946.7 3024.7
## - PerformanceRating               1   2946.7 3024.7
## - EducationField_Other            1   2946.7 3024.7
## - EducationField_Marketing        1   2946.8 3024.8
## - JobRole_healthcare_rep          1   2947.0 3025.0
## - DistanceFromHome                1   2947.0 3025.0
## - PercentSalaryHike               1   2947.2 3025.2
## - Gender_Female                   1   2947.2 3025.2
## - JobRole_Manager                 1   2947.3 3025.3
## - MonthlyIncome                   1   2947.9 3025.9
## - JobRole_Laboratory_Technician   1   2948.0 3026.0
## - Education                       1   2948.3 3026.3
## - EducationField_Medical          1   2948.6 3026.6
## - StockOptionLevel                1   2948.6 3026.6
## - JobInvolvement                  1   2948.7 3026.7
## <none>                                2946.7 3026.7
## - JobLevel                        1   2948.8 3026.8
## - JobRole_Research_Scientist      1   2948.9 3026.9
## - JobRole_Manufacturing_Director  1   2949.1 3027.1
## - EducationField_Life_Science     1   2950.0 3028.0
## - Department_HR                   1   2950.4 3028.4
## - YearsAtCompany                  1   2950.4 3028.4
## - JobRole_Sales_Executive         1   2950.8 3028.8
## - EducationField_HR               1   2951.8 3029.8
## - JobRole_Research_Director       1   2954.1 3032.1
## - Business_travel_rarely          1   2958.0 3036.0
## - TrainingTimesLastYear           1   2963.8 3041.8
## - Age                             1   2966.9 3045.0
## - WorkLifeBalance                 1   2974.9 3052.9
## - TotalWorkingYears               1   2986.7 3064.7
## - YearsWithCurrManager            1   2996.4 3074.4
## - NumCompaniesWorked              1   2997.6 3075.6
## - Business_travel_frequently      1   2999.8 3077.8
## - YearsSinceLastPromotion         1   3002.8 3080.8
## - JobSatisfaction                 1   3015.6 3093.6
## - married                         1   3017.4 3095.4
## - divorced                        1   3021.7 3099.7
## - EnvironmentSatisfaction         1   3026.2 3104.2
## - avg_work_hours                  1   3117.8 3195.9
## 
## Step:  AIC=3024.72
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + DistanceFromHome + Education + JobLevel + 
##     MonthlyIncome + NumCompaniesWorked + PercentSalaryHike + 
##     StockOptionLevel + TotalWorkingYears + TrainingTimesLastYear + 
##     YearsAtCompany + YearsSinceLastPromotion + YearsWithCurrManager + 
##     JobInvolvement + PerformanceRating + Business_travel_rarely + 
##     Business_travel_frequently + Department_HR + Department_RnD + 
##     EducationField_HR + EducationField_Life_Science + EducationField_Marketing + 
##     EducationField_Medical + EducationField_Other + Gender_Female + 
##     JobRole_healthcare_rep + JobRole_Laboratory_Technician + 
##     JobRole_Manager + JobRole_Manufacturing_Director + JobRole_Research_Director + 
##     JobRole_Research_Scientist + JobRole_Sales_Executive + divorced + 
##     married
## 
##                                  Df Deviance    AIC
## - Department_RnD                  1   2946.7 3022.7
## - PerformanceRating               1   2946.7 3022.7
## - EducationField_Other            1   2946.7 3022.7
## - EducationField_Marketing        1   2946.8 3022.8
## - DistanceFromHome                1   2947.0 3023.0
## - JobRole_healthcare_rep          1   2947.1 3023.1
## - PercentSalaryHike               1   2947.2 3023.2
## - Gender_Female                   1   2947.2 3023.2
## - JobRole_Manager                 1   2947.4 3023.4
## - MonthlyIncome                   1   2947.9 3023.9
## - Education                       1   2948.3 3024.3
## - JobRole_Laboratory_Technician   1   2948.4 3024.4
## - EducationField_Medical          1   2948.6 3024.6
## - StockOptionLevel                1   2948.6 3024.6
## - JobInvolvement                  1   2948.7 3024.7
## <none>                                2946.7 3024.7
## - JobLevel                        1   2948.8 3024.8
## - JobRole_Research_Scientist      1   2949.8 3025.8
## - JobRole_Manufacturing_Director  1   2949.8 3025.8
## - EducationField_Life_Science     1   2950.0 3026.0
## - Department_HR                   1   2950.4 3026.4
## - YearsAtCompany                  1   2950.4 3026.4
## - EducationField_HR               1   2951.8 3027.8
## - JobRole_Sales_Executive         1   2952.5 3028.5
## - JobRole_Research_Director       1   2955.9 3031.9
## - Business_travel_rarely          1   2958.0 3034.0
## - TrainingTimesLastYear           1   2963.8 3039.9
## - Age                             1   2966.9 3043.0
## - WorkLifeBalance                 1   2974.9 3050.9
## - TotalWorkingYears               1   2986.7 3062.7
## - YearsWithCurrManager            1   2996.4 3072.4
## - NumCompaniesWorked              1   2997.6 3073.6
## - Business_travel_frequently      1   2999.8 3075.8
## - YearsSinceLastPromotion         1   3002.8 3078.8
## - JobSatisfaction                 1   3015.6 3091.6
## - married                         1   3017.4 3093.4
## - divorced                        1   3021.8 3097.9
## - EnvironmentSatisfaction         1   3026.2 3102.2
## - avg_work_hours                  1   3117.9 3193.9
## 
## Step:  AIC=3022.72
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + DistanceFromHome + Education + JobLevel + 
##     MonthlyIncome + NumCompaniesWorked + PercentSalaryHike + 
##     StockOptionLevel + TotalWorkingYears + TrainingTimesLastYear + 
##     YearsAtCompany + YearsSinceLastPromotion + YearsWithCurrManager + 
##     JobInvolvement + PerformanceRating + Business_travel_rarely + 
##     Business_travel_frequently + Department_HR + EducationField_HR + 
##     EducationField_Life_Science + EducationField_Marketing + 
##     EducationField_Medical + EducationField_Other + Gender_Female + 
##     JobRole_healthcare_rep + JobRole_Laboratory_Technician + 
##     JobRole_Manager + JobRole_Manufacturing_Director + JobRole_Research_Director + 
##     JobRole_Research_Scientist + JobRole_Sales_Executive + divorced + 
##     married
## 
##                                  Df Deviance    AIC
## - PerformanceRating               1   2946.7 3020.7
## - EducationField_Other            1   2946.7 3020.7
## - EducationField_Marketing        1   2946.9 3020.9
## - DistanceFromHome                1   2947.0 3021.0
## - JobRole_healthcare_rep          1   2947.1 3021.1
## - PercentSalaryHike               1   2947.2 3021.2
## - Gender_Female                   1   2947.2 3021.2
## - JobRole_Manager                 1   2947.4 3021.4
## - MonthlyIncome                   1   2947.9 3021.9
## - Education                       1   2948.3 3022.3
## - JobRole_Laboratory_Technician   1   2948.4 3022.4
## - EducationField_Medical          1   2948.6 3022.6
## - StockOptionLevel                1   2948.6 3022.6
## - JobInvolvement                  1   2948.7 3022.7
## <none>                                2946.7 3022.7
## - JobLevel                        1   2948.8 3022.9
## - JobRole_Research_Scientist      1   2949.8 3023.8
## - JobRole_Manufacturing_Director  1   2949.8 3023.8
## - EducationField_Life_Science     1   2950.0 3024.0
## - YearsAtCompany                  1   2950.4 3024.4
## - Department_HR                   1   2950.7 3024.7
## - EducationField_HR               1   2951.8 3025.8
## - JobRole_Sales_Executive         1   2952.5 3026.5
## - JobRole_Research_Director       1   2955.9 3029.9
## - Business_travel_rarely          1   2958.0 3032.0
## - TrainingTimesLastYear           1   2963.8 3037.9
## - Age                             1   2967.1 3041.1
## - WorkLifeBalance                 1   2974.9 3048.9
## - TotalWorkingYears               1   2986.8 3060.8
## - YearsWithCurrManager            1   2996.4 3070.4
## - NumCompaniesWorked              1   2997.8 3071.8
## - Business_travel_frequently      1   2999.9 3074.0
## - YearsSinceLastPromotion         1   3002.8 3076.8
## - JobSatisfaction                 1   3015.6 3089.6
## - married                         1   3017.5 3091.5
## - divorced                        1   3021.9 3095.9
## - EnvironmentSatisfaction         1   3026.3 3100.3
## - avg_work_hours                  1   3118.2 3192.2
## 
## Step:  AIC=3020.72
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + DistanceFromHome + Education + JobLevel + 
##     MonthlyIncome + NumCompaniesWorked + PercentSalaryHike + 
##     StockOptionLevel + TotalWorkingYears + TrainingTimesLastYear + 
##     YearsAtCompany + YearsSinceLastPromotion + YearsWithCurrManager + 
##     JobInvolvement + Business_travel_rarely + Business_travel_frequently + 
##     Department_HR + EducationField_HR + EducationField_Life_Science + 
##     EducationField_Marketing + EducationField_Medical + EducationField_Other + 
##     Gender_Female + JobRole_healthcare_rep + JobRole_Laboratory_Technician + 
##     JobRole_Manager + JobRole_Manufacturing_Director + JobRole_Research_Director + 
##     JobRole_Research_Scientist + JobRole_Sales_Executive + divorced + 
##     married
## 
##                                  Df Deviance    AIC
## - EducationField_Other            1   2946.7 3018.7
## - EducationField_Marketing        1   2946.9 3018.9
## - DistanceFromHome                1   2947.0 3019.0
## - JobRole_healthcare_rep          1   2947.1 3019.1
## - Gender_Female                   1   2947.2 3019.2
## - JobRole_Manager                 1   2947.4 3019.4
## - PercentSalaryHike               1   2947.8 3019.8
## - MonthlyIncome                   1   2947.9 3019.9
## - Education                       1   2948.3 3020.3
## - JobRole_Laboratory_Technician   1   2948.4 3020.4
## - EducationField_Medical          1   2948.6 3020.6
## - StockOptionLevel                1   2948.6 3020.6
## - JobInvolvement                  1   2948.7 3020.7
## <none>                                2946.7 3020.7
## - JobLevel                        1   2948.8 3020.9
## - JobRole_Research_Scientist      1   2949.8 3021.8
## - JobRole_Manufacturing_Director  1   2949.8 3021.8
## - EducationField_Life_Science     1   2950.0 3022.0
## - YearsAtCompany                  1   2950.4 3022.4
## - Department_HR                   1   2950.7 3022.7
## - EducationField_HR               1   2951.8 3023.8
## - JobRole_Sales_Executive         1   2952.5 3024.5
## - JobRole_Research_Director       1   2955.9 3027.9
## - Business_travel_rarely          1   2958.0 3030.0
## - TrainingTimesLastYear           1   2963.9 3035.9
## - Age                             1   2967.1 3039.1
## - WorkLifeBalance                 1   2974.9 3047.0
## - TotalWorkingYears               1   2986.8 3058.8
## - YearsWithCurrManager            1   2996.4 3068.4
## - NumCompaniesWorked              1   2997.8 3069.8
## - Business_travel_frequently      1   3000.0 3072.0
## - YearsSinceLastPromotion         1   3002.8 3074.8
## - JobSatisfaction                 1   3015.7 3087.7
## - married                         1   3017.6 3089.6
## - divorced                        1   3021.9 3093.9
## - EnvironmentSatisfaction         1   3026.4 3098.4
## - avg_work_hours                  1   3118.6 3190.6
## 
## Step:  AIC=3018.72
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + DistanceFromHome + Education + JobLevel + 
##     MonthlyIncome + NumCompaniesWorked + PercentSalaryHike + 
##     StockOptionLevel + TotalWorkingYears + TrainingTimesLastYear + 
##     YearsAtCompany + YearsSinceLastPromotion + YearsWithCurrManager + 
##     JobInvolvement + Business_travel_rarely + Business_travel_frequently + 
##     Department_HR + EducationField_HR + EducationField_Life_Science + 
##     EducationField_Marketing + EducationField_Medical + Gender_Female + 
##     JobRole_healthcare_rep + JobRole_Laboratory_Technician + 
##     JobRole_Manager + JobRole_Manufacturing_Director + JobRole_Research_Director + 
##     JobRole_Research_Scientist + JobRole_Sales_Executive + divorced + 
##     married
## 
##                                  Df Deviance    AIC
## - EducationField_Marketing        1   2946.9 3016.9
## - DistanceFromHome                1   2947.0 3017.0
## - JobRole_healthcare_rep          1   2947.1 3017.1
## - Gender_Female                   1   2947.2 3017.2
## - JobRole_Manager                 1   2947.4 3017.4
## - PercentSalaryHike               1   2947.8 3017.8
## - MonthlyIncome                   1   2947.9 3017.9
## - Education                       1   2948.3 3018.4
## - JobRole_Laboratory_Technician   1   2948.4 3018.4
## - StockOptionLevel                1   2948.6 3018.6
## - JobInvolvement                  1   2948.7 3018.7
## <none>                                2946.7 3018.7
## - JobLevel                        1   2948.9 3018.9
## - EducationField_Medical          1   2949.6 3019.6
## - JobRole_Research_Scientist      1   2949.8 3019.8
## - JobRole_Manufacturing_Director  1   2949.8 3019.8
## - YearsAtCompany                  1   2950.4 3020.4
## - Department_HR                   1   2950.7 3020.7
## - EducationField_Life_Science     1   2951.8 3021.8
## - EducationField_HR               1   2952.2 3022.2
## - JobRole_Sales_Executive         1   2952.5 3022.5
## - JobRole_Research_Director       1   2955.9 3025.9
## - Business_travel_rarely          1   2958.0 3028.0
## - TrainingTimesLastYear           1   2963.9 3033.9
## - Age                             1   2967.1 3037.1
## - WorkLifeBalance                 1   2975.0 3045.0
## - TotalWorkingYears               1   2986.9 3056.9
## - YearsWithCurrManager            1   2996.4 3066.4
## - NumCompaniesWorked              1   2997.8 3067.8
## - Business_travel_frequently      1   3000.0 3070.0
## - YearsSinceLastPromotion         1   3003.1 3073.0
## - JobSatisfaction                 1   3015.9 3085.9
## - married                         1   3018.0 3088.0
## - divorced                        1   3021.9 3091.9
## - EnvironmentSatisfaction         1   3026.4 3096.4
## - avg_work_hours                  1   3118.7 3188.7
## 
## Step:  AIC=3016.92
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + DistanceFromHome + Education + JobLevel + 
##     MonthlyIncome + NumCompaniesWorked + PercentSalaryHike + 
##     StockOptionLevel + TotalWorkingYears + TrainingTimesLastYear + 
##     YearsAtCompany + YearsSinceLastPromotion + YearsWithCurrManager + 
##     JobInvolvement + Business_travel_rarely + Business_travel_frequently + 
##     Department_HR + EducationField_HR + EducationField_Life_Science + 
##     EducationField_Medical + Gender_Female + JobRole_healthcare_rep + 
##     JobRole_Laboratory_Technician + JobRole_Manager + JobRole_Manufacturing_Director + 
##     JobRole_Research_Director + JobRole_Research_Scientist + 
##     JobRole_Sales_Executive + divorced + married
## 
##                                  Df Deviance    AIC
## - DistanceFromHome                1   2947.2 3015.2
## - JobRole_healthcare_rep          1   2947.2 3015.2
## - Gender_Female                   1   2947.4 3015.4
## - JobRole_Manager                 1   2947.6 3015.6
## - PercentSalaryHike               1   2947.9 3015.9
## - MonthlyIncome                   1   2948.1 3016.1
## - Education                       1   2948.6 3016.6
## - JobRole_Laboratory_Technician   1   2948.7 3016.7
## - StockOptionLevel                1   2948.8 3016.8
## - JobInvolvement                  1   2948.9 3016.9
## <none>                                2946.9 3016.9
## - JobLevel                        1   2949.0 3017.0
## - JobRole_Manufacturing_Director  1   2950.0 3018.0
## - EducationField_Medical          1   2950.0 3018.0
## - JobRole_Research_Scientist      1   2950.1 3018.1
## - YearsAtCompany                  1   2950.7 3018.6
## - Department_HR                   1   2950.8 3018.8
## - EducationField_HR               1   2952.2 3020.2
## - JobRole_Sales_Executive         1   2952.8 3020.8
## - EducationField_Life_Science     1   2953.0 3021.0
## - JobRole_Research_Director       1   2956.1 3024.1
## - Business_travel_rarely          1   2958.4 3026.4
## - TrainingTimesLastYear           1   2963.9 3031.9
## - Age                             1   2967.6 3035.5
## - WorkLifeBalance                 1   2975.4 3043.4
## - TotalWorkingYears               1   2987.1 3055.1
## - YearsWithCurrManager            1   2996.9 3064.9
## - NumCompaniesWorked              1   2998.3 3066.3
## - Business_travel_frequently      1   3000.9 3068.9
## - YearsSinceLastPromotion         1   3003.3 3071.3
## - JobSatisfaction                 1   3015.9 3083.9
## - married                         1   3018.5 3086.5
## - divorced                        1   3021.9 3089.9
## - EnvironmentSatisfaction         1   3027.0 3095.0
## - avg_work_hours                  1   3119.2 3187.2
## 
## Step:  AIC=3015.19
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + Education + JobLevel + MonthlyIncome + 
##     NumCompaniesWorked + PercentSalaryHike + StockOptionLevel + 
##     TotalWorkingYears + TrainingTimesLastYear + YearsAtCompany + 
##     YearsSinceLastPromotion + YearsWithCurrManager + JobInvolvement + 
##     Business_travel_rarely + Business_travel_frequently + Department_HR + 
##     EducationField_HR + EducationField_Life_Science + EducationField_Medical + 
##     Gender_Female + JobRole_healthcare_rep + JobRole_Laboratory_Technician + 
##     JobRole_Manager + JobRole_Manufacturing_Director + JobRole_Research_Director + 
##     JobRole_Research_Scientist + JobRole_Sales_Executive + divorced + 
##     married
## 
##                                  Df Deviance    AIC
## - JobRole_healthcare_rep          1   2947.5 3013.5
## - Gender_Female                   1   2947.7 3013.7
## - JobRole_Manager                 1   2947.8 3013.8
## - PercentSalaryHike               1   2948.2 3014.2
## - MonthlyIncome                   1   2948.4 3014.4
## - Education                       1   2948.8 3014.8
## - JobRole_Laboratory_Technician   1   2948.9 3014.9
## - StockOptionLevel                1   2949.1 3015.1
## - JobInvolvement                  1   2949.2 3015.2
## <none>                                2947.2 3015.2
## - JobLevel                        1   2949.2 3015.2
## - JobRole_Manufacturing_Director  1   2950.3 3016.3
## - EducationField_Medical          1   2950.3 3016.3
## - JobRole_Research_Scientist      1   2950.4 3016.4
## - YearsAtCompany                  1   2950.9 3016.9
## - Department_HR                   1   2951.3 3017.3
## - EducationField_HR               1   2952.3 3018.3
## - JobRole_Sales_Executive         1   2953.1 3019.1
## - EducationField_Life_Science     1   2953.2 3019.1
## - JobRole_Research_Director       1   2956.4 3022.4
## - Business_travel_rarely          1   2958.6 3024.6
## - TrainingTimesLastYear           1   2964.1 3030.1
## - Age                             1   2967.9 3033.9
## - WorkLifeBalance                 1   2975.6 3041.6
## - TotalWorkingYears               1   2987.2 3053.2
## - YearsWithCurrManager            1   2997.1 3063.1
## - NumCompaniesWorked              1   2998.6 3064.6
## - Business_travel_frequently      1   3001.2 3067.2
## - YearsSinceLastPromotion         1   3003.6 3069.6
## - JobSatisfaction                 1   3015.9 3081.9
## - married                         1   3019.5 3085.5
## - divorced                        1   3022.3 3088.3
## - EnvironmentSatisfaction         1   3027.4 3093.4
## - avg_work_hours                  1   3119.4 3185.5
## 
## Step:  AIC=3013.53
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + Education + JobLevel + MonthlyIncome + 
##     NumCompaniesWorked + PercentSalaryHike + StockOptionLevel + 
##     TotalWorkingYears + TrainingTimesLastYear + YearsAtCompany + 
##     YearsSinceLastPromotion + YearsWithCurrManager + JobInvolvement + 
##     Business_travel_rarely + Business_travel_frequently + Department_HR + 
##     EducationField_HR + EducationField_Life_Science + EducationField_Medical + 
##     Gender_Female + JobRole_Laboratory_Technician + JobRole_Manager + 
##     JobRole_Manufacturing_Director + JobRole_Research_Director + 
##     JobRole_Research_Scientist + JobRole_Sales_Executive + divorced + 
##     married
## 
##                                  Df Deviance    AIC
## - Gender_Female                   1   2948.1 3012.1
## - PercentSalaryHike               1   2948.5 3012.5
## - MonthlyIncome                   1   2948.7 3012.7
## - JobRole_Manager                 1   2948.9 3012.9
## - JobRole_Laboratory_Technician   1   2949.0 3013.0
## - Education                       1   2949.2 3013.3
## - StockOptionLevel                1   2949.4 3013.4
## - JobInvolvement                  1   2949.5 3013.5
## <none>                                2947.5 3013.5
## - JobLevel                        1   2949.5 3013.5
## - EducationField_Medical          1   2950.6 3014.6
## - JobRole_Research_Scientist      1   2950.7 3014.7
## - YearsAtCompany                  1   2951.2 3015.2
## - Department_HR                   1   2951.5 3015.5
## - EducationField_HR               1   2952.7 3016.6
## - JobRole_Manufacturing_Director  1   2953.0 3017.0
## - EducationField_Life_Science     1   2953.4 3017.4
## - JobRole_Sales_Executive         1   2954.1 3018.0
## - JobRole_Research_Director       1   2957.0 3021.0
## - Business_travel_rarely          1   2959.1 3023.1
## - TrainingTimesLastYear           1   2964.7 3028.7
## - Age                             1   2968.7 3032.7
## - WorkLifeBalance                 1   2975.9 3039.9
## - TotalWorkingYears               1   2987.4 3051.4
## - YearsWithCurrManager            1   2997.4 3061.5
## - NumCompaniesWorked              1   2998.9 3062.9
## - Business_travel_frequently      1   3001.8 3065.8
## - YearsSinceLastPromotion         1   3004.1 3068.1
## - JobSatisfaction                 1   3016.2 3080.2
## - married                         1   3019.5 3083.5
## - divorced                        1   3022.5 3086.5
## - EnvironmentSatisfaction         1   3027.7 3091.7
## - avg_work_hours                  1   3119.6 3183.6
## 
## Step:  AIC=3012.05
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + Education + JobLevel + MonthlyIncome + 
##     NumCompaniesWorked + PercentSalaryHike + StockOptionLevel + 
##     TotalWorkingYears + TrainingTimesLastYear + YearsAtCompany + 
##     YearsSinceLastPromotion + YearsWithCurrManager + JobInvolvement + 
##     Business_travel_rarely + Business_travel_frequently + Department_HR + 
##     EducationField_HR + EducationField_Life_Science + EducationField_Medical + 
##     JobRole_Laboratory_Technician + JobRole_Manager + JobRole_Manufacturing_Director + 
##     JobRole_Research_Director + JobRole_Research_Scientist + 
##     JobRole_Sales_Executive + divorced + married
## 
##                                  Df Deviance    AIC
## - PercentSalaryHike               1   2949.1 3011.1
## - MonthlyIncome                   1   2949.2 3011.2
## - JobRole_Laboratory_Technician   1   2949.4 3011.4
## - JobRole_Manager                 1   2949.5 3011.5
## - Education                       1   2949.8 3011.8
## - JobInvolvement                  1   2949.9 3011.9
## - StockOptionLevel                1   2950.0 3012.0
## <none>                                2948.1 3012.1
## - JobLevel                        1   2950.1 3012.1
## - EducationField_Medical          1   2951.0 3013.0
## - JobRole_Research_Scientist      1   2951.2 3013.2
## - YearsAtCompany                  1   2951.8 3013.8
## - Department_HR                   1   2952.0 3014.0
## - EducationField_HR               1   2953.3 3015.3
## - JobRole_Manufacturing_Director  1   2953.6 3015.6
## - EducationField_Life_Science     1   2953.9 3015.9
## - JobRole_Sales_Executive         1   2954.5 3016.5
## - JobRole_Research_Director       1   2957.5 3019.5
## - Business_travel_rarely          1   2959.4 3021.4
## - TrainingTimesLastYear           1   2965.2 3027.2
## - Age                             1   2969.5 3031.5
## - WorkLifeBalance                 1   2976.6 3038.6
## - TotalWorkingYears               1   2987.9 3049.9
## - YearsWithCurrManager            1   2998.2 3060.2
## - NumCompaniesWorked              1   2998.9 3060.9
## - Business_travel_frequently      1   3002.0 3064.0
## - YearsSinceLastPromotion         1   3004.5 3066.5
## - JobSatisfaction                 1   3016.8 3078.8
## - married                         1   3020.0 3082.0
## - divorced                        1   3022.7 3084.7
## - EnvironmentSatisfaction         1   3028.4 3090.4
## - avg_work_hours                  1   3120.2 3182.2
## 
## Step:  AIC=3011.08
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + Education + JobLevel + MonthlyIncome + 
##     NumCompaniesWorked + StockOptionLevel + TotalWorkingYears + 
##     TrainingTimesLastYear + YearsAtCompany + YearsSinceLastPromotion + 
##     YearsWithCurrManager + JobInvolvement + Business_travel_rarely + 
##     Business_travel_frequently + Department_HR + EducationField_HR + 
##     EducationField_Life_Science + EducationField_Medical + JobRole_Laboratory_Technician + 
##     JobRole_Manager + JobRole_Manufacturing_Director + JobRole_Research_Director + 
##     JobRole_Research_Scientist + JobRole_Sales_Executive + divorced + 
##     married
## 
##                                  Df Deviance    AIC
## - MonthlyIncome                   1   2950.3 3010.3
## - JobRole_Laboratory_Technician   1   2950.4 3010.4
## - JobRole_Manager                 1   2950.6 3010.6
## - Education                       1   2950.9 3010.9
## - StockOptionLevel                1   2950.9 3010.9
## - JobInvolvement                  1   2950.9 3010.9
## <none>                                2949.1 3011.1
## - JobLevel                        1   2951.1 3011.1
## - JobRole_Research_Scientist      1   2952.1 3012.1
## - EducationField_Medical          1   2952.2 3012.2
## - YearsAtCompany                  1   2952.8 3012.8
## - Department_HR                   1   2952.9 3012.9
## - EducationField_HR               1   2954.4 3014.4
## - JobRole_Manufacturing_Director  1   2954.8 3014.7
## - EducationField_Life_Science     1   2955.0 3015.0
## - JobRole_Sales_Executive         1   2955.2 3015.2
## - JobRole_Research_Director       1   2958.3 3018.3
## - Business_travel_rarely          1   2960.0 3020.0
## - TrainingTimesLastYear           1   2966.4 3026.4
## - Age                             1   2971.2 3031.2
## - WorkLifeBalance                 1   2977.7 3037.7
## - TotalWorkingYears               1   2989.0 3049.0
## - YearsWithCurrManager            1   2999.2 3059.2
## - NumCompaniesWorked              1   3000.9 3060.9
## - Business_travel_frequently      1   3002.3 3062.3
## - YearsSinceLastPromotion         1   3005.8 3065.8
## - JobSatisfaction                 1   3017.4 3077.4
## - married                         1   3021.2 3081.2
## - divorced                        1   3024.0 3084.0
## - EnvironmentSatisfaction         1   3029.7 3089.7
## - avg_work_hours                  1   3123.3 3183.3
## 
## Step:  AIC=3010.33
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + Education + JobLevel + NumCompaniesWorked + 
##     StockOptionLevel + TotalWorkingYears + TrainingTimesLastYear + 
##     YearsAtCompany + YearsSinceLastPromotion + YearsWithCurrManager + 
##     JobInvolvement + Business_travel_rarely + Business_travel_frequently + 
##     Department_HR + EducationField_HR + EducationField_Life_Science + 
##     EducationField_Medical + JobRole_Laboratory_Technician + 
##     JobRole_Manager + JobRole_Manufacturing_Director + JobRole_Research_Director + 
##     JobRole_Research_Scientist + JobRole_Sales_Executive + divorced + 
##     married
## 
##                                  Df Deviance    AIC
## - JobRole_Laboratory_Technician   1   2951.5 3009.5
## - JobRole_Manager                 1   2951.9 3010.0
## - Education                       1   2952.2 3010.2
## - JobInvolvement                  1   2952.3 3010.3
## <none>                                2950.3 3010.3
## - StockOptionLevel                1   2952.3 3010.3
## - JobLevel                        1   2952.5 3010.5
## - JobRole_Research_Scientist      1   2953.3 3011.3
## - EducationField_Medical          1   2953.4 3011.4
## - YearsAtCompany                  1   2954.1 3012.1
## - Department_HR                   1   2954.3 3012.3
## - EducationField_HR               1   2955.5 3013.5
## - EducationField_Life_Science     1   2956.1 3014.1
## - JobRole_Sales_Executive         1   2956.2 3014.2
## - JobRole_Manufacturing_Director  1   2956.4 3014.4
## - JobRole_Research_Director       1   2959.2 3017.2
## - Business_travel_rarely          1   2961.8 3019.8
## - TrainingTimesLastYear           1   2967.8 3025.8
## - Age                             1   2971.8 3029.8
## - WorkLifeBalance                 1   2978.7 3036.7
## - TotalWorkingYears               1   2990.8 3048.8
## - YearsWithCurrManager            1   3000.2 3058.2
## - NumCompaniesWorked              1   3002.9 3060.9
## - Business_travel_frequently      1   3005.1 3063.1
## - YearsSinceLastPromotion         1   3006.4 3064.4
## - JobSatisfaction                 1   3018.7 3076.7
## - married                         1   3024.1 3082.1
## - divorced                        1   3026.9 3084.9
## - EnvironmentSatisfaction         1   3030.6 3088.6
## - avg_work_hours                  1   3124.9 3182.9
## 
## Step:  AIC=3009.54
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + Education + JobLevel + NumCompaniesWorked + 
##     StockOptionLevel + TotalWorkingYears + TrainingTimesLastYear + 
##     YearsAtCompany + YearsSinceLastPromotion + YearsWithCurrManager + 
##     JobInvolvement + Business_travel_rarely + Business_travel_frequently + 
##     Department_HR + EducationField_HR + EducationField_Life_Science + 
##     EducationField_Medical + JobRole_Manager + JobRole_Manufacturing_Director + 
##     JobRole_Research_Director + JobRole_Research_Scientist + 
##     JobRole_Sales_Executive + divorced + married
## 
##                                  Df Deviance    AIC
## - JobRole_Research_Scientist      1   2953.3 3009.3
## - JobInvolvement                  1   2953.4 3009.4
## - StockOptionLevel                1   2953.4 3009.4
## <none>                                2951.5 3009.5
## - Education                       1   2953.7 3009.7
## - JobLevel                        1   2953.8 3009.8
## - EducationField_Medical          1   2954.6 3010.6
## - JobRole_Manager                 1   2954.7 3010.7
## - YearsAtCompany                  1   2955.3 3011.3
## - Department_HR                   1   2955.6 3011.6
## - JobRole_Sales_Executive         1   2956.2 3012.3
## - EducationField_HR               1   2956.8 3012.8
## - EducationField_Life_Science     1   2957.3 3013.3
## - JobRole_Research_Director       1   2959.2 3015.2
## - JobRole_Manufacturing_Director  1   2961.7 3017.7
## - Business_travel_rarely          1   2963.3 3019.3
## - TrainingTimesLastYear           1   2968.9 3024.9
## - Age                             1   2972.7 3028.7
## - WorkLifeBalance                 1   2979.7 3035.7
## - TotalWorkingYears               1   2991.9 3047.9
## - YearsWithCurrManager            1   3001.5 3057.5
## - NumCompaniesWorked              1   3004.2 3060.2
## - Business_travel_frequently      1   3006.8 3062.8
## - YearsSinceLastPromotion         1   3007.6 3063.6
## - JobSatisfaction                 1   3019.9 3075.9
## - married                         1   3026.0 3082.0
## - divorced                        1   3028.1 3084.1
## - EnvironmentSatisfaction         1   3032.2 3088.2
## - avg_work_hours                  1   3125.8 3181.8
## 
## Step:  AIC=3009.33
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + Education + JobLevel + NumCompaniesWorked + 
##     StockOptionLevel + TotalWorkingYears + TrainingTimesLastYear + 
##     YearsAtCompany + YearsSinceLastPromotion + YearsWithCurrManager + 
##     JobInvolvement + Business_travel_rarely + Business_travel_frequently + 
##     Department_HR + EducationField_HR + EducationField_Life_Science + 
##     EducationField_Medical + JobRole_Manager + JobRole_Manufacturing_Director + 
##     JobRole_Research_Director + JobRole_Sales_Executive + divorced + 
##     married
## 
##                                  Df Deviance    AIC
## - StockOptionLevel                1   2955.1 3009.1
## - JobInvolvement                  1   2955.3 3009.3
## <none>                                2953.3 3009.3
## - Education                       1   2955.4 3009.4
## - JobLevel                        1   2955.6 3009.6
## - EducationField_Medical          1   2956.5 3010.5
## - JobRole_Sales_Executive         1   2956.6 3010.6
## - YearsAtCompany                  1   2956.9 3010.9
## - Department_HR                   1   2957.3 3011.3
## - JobRole_Manager                 1   2958.1 3012.1
## - EducationField_HR               1   2958.7 3012.7
## - EducationField_Life_Science     1   2959.1 3013.1
## - JobRole_Research_Director       1   2959.7 3013.7
## - Business_travel_rarely          1   2965.1 3019.1
## - JobRole_Manufacturing_Director  1   2967.1 3021.1
## - TrainingTimesLastYear           1   2970.7 3024.7
## - Age                             1   2974.2 3028.2
## - WorkLifeBalance                 1   2981.1 3035.1
## - TotalWorkingYears               1   2994.0 3048.0
## - YearsWithCurrManager            1   3002.5 3056.5
## - NumCompaniesWorked              1   3005.5 3059.5
## - Business_travel_frequently      1   3009.2 3063.2
## - YearsSinceLastPromotion         1   3009.9 3063.9
## - JobSatisfaction                 1   3022.2 3076.2
## - married                         1   3029.2 3083.2
## - divorced                        1   3030.4 3084.4
## - EnvironmentSatisfaction         1   3034.4 3088.4
## - avg_work_hours                  1   3128.6 3182.6
## 
## Step:  AIC=3009.08
## Attrition ~ EnvironmentSatisfaction + JobSatisfaction + WorkLifeBalance + 
##     avg_work_hours + Age + Education + JobLevel + NumCompaniesWorked + 
##     TotalWorkingYears + TrainingTimesLastYear + YearsAtCompany + 
##     YearsSinceLastPromotion + YearsWithCurrManager + JobInvolvement + 
##     Business_travel_rarely + Business_travel_frequently + Department_HR + 
##     EducationField_HR + EducationField_Life_Science + EducationField_Medical + 
##     JobRole_Manager + JobRole_Manufacturing_Director + JobRole_Research_Director + 
##     JobRole_Sales_Executive + divorced + married
## 
##                                  Df Deviance    AIC
## <none>                                2955.1 3009.1
## - JobInvolvement                  1   2957.1 3009.1
## - Education                       1   2957.2 3009.2
## - JobLevel                        1   2957.4 3009.4
## - EducationField_Medical          1   2957.9 3009.9
## - JobRole_Sales_Executive         1   2958.1 3010.1
## - YearsAtCompany                  1   2958.4 3010.4
## - Department_HR                   1   2959.0 3011.0
## - JobRole_Manager                 1   2959.4 3011.4
## - EducationField_HR               1   2960.4 3012.4
## - EducationField_Life_Science     1   2960.9 3012.9
## - JobRole_Research_Director       1   2961.7 3013.7
## - Business_travel_rarely          1   2966.7 3018.7
## - JobRole_Manufacturing_Director  1   2968.9 3020.9
## - TrainingTimesLastYear           1   2971.6 3023.6
## - Age                             1   2975.5 3027.5
## - WorkLifeBalance                 1   2982.5 3034.5
## - TotalWorkingYears               1   2996.0 3048.0
## - YearsWithCurrManager            1   3003.5 3055.5
## - NumCompaniesWorked              1   3006.6 3058.6
## - Business_travel_frequently      1   3011.1 3063.1
## - YearsSinceLastPromotion         1   3011.9 3063.9
## - JobSatisfaction                 1   3024.0 3076.0
## - married                         1   3029.9 3081.9
## - divorced                        1   3031.4 3083.4
## - EnvironmentSatisfaction         1   3036.2 3088.2
## - avg_work_hours                  1   3132.0 3184.0
## 
## Call:
## glm(formula = Attrition ~ EnvironmentSatisfaction + JobSatisfaction + 
##     WorkLifeBalance + avg_work_hours + Age + Education + JobLevel + 
##     NumCompaniesWorked + TotalWorkingYears + TrainingTimesLastYear + 
##     YearsAtCompany + YearsSinceLastPromotion + YearsWithCurrManager + 
##     JobInvolvement + Business_travel_rarely + Business_travel_frequently + 
##     Department_HR + EducationField_HR + EducationField_Life_Science + 
##     EducationField_Medical + JobRole_Manager + JobRole_Manufacturing_Director + 
##     JobRole_Research_Director + JobRole_Sales_Executive + divorced + 
##     married, family = binomial(link = "logit"), data = employee_data_original)
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    -0.115912   0.530311  -0.219 0.826982    
## EnvironmentSatisfaction        -0.385120   0.043307  -8.893  < 2e-16 ***
## JobSatisfaction                -0.354636   0.043130  -8.223  < 2e-16 ***
## WorkLifeBalance                -0.341789   0.065040  -5.255 1.48e-07 ***
## avg_work_hours                  0.444859   0.033910  13.119  < 2e-16 ***
## Age                            -0.032819   0.007425  -4.420 9.88e-06 ***
## Education                      -0.067026   0.046566  -1.439 0.150047    
## JobLevel                       -0.065236   0.043442  -1.502 0.133176    
## NumCompaniesWorked              0.147476   0.020251   7.282 3.28e-13 ***
## TotalWorkingYears              -0.082206   0.013577  -6.055 1.41e-09 ***
## TrainingTimesLastYear          -0.151225   0.037724  -4.009 6.10e-05 ***
## YearsAtCompany                  0.036121   0.019556   1.847 0.064733 .  
## YearsSinceLastPromotion         0.164640   0.021998   7.484 7.19e-14 ***
## YearsWithCurrManager           -0.174013   0.024636  -7.063 1.62e-12 ***
## JobInvolvement                 -0.093901   0.065688  -1.430 0.152856    
## Business_travel_rarely          0.629636   0.194771   3.233 0.001226 ** 
## Business_travel_frequently      1.437121   0.208714   6.886 5.75e-12 ***
## Department_HR                   0.580210   0.281444   2.062 0.039251 *  
## EducationField_HR               0.902870   0.394030   2.291 0.021942 *  
## EducationField_Life_Science     0.293042   0.122874   2.385 0.017084 *  
## EducationField_Medical          0.216453   0.129169   1.676 0.093790 .  
## JobRole_Manager                -0.414069   0.204516  -2.025 0.042906 *  
## JobRole_Manufacturing_Director -0.647802   0.182113  -3.557 0.000375 ***
## JobRole_Research_Director       0.503021   0.191467   2.627 0.008609 ** 
## JobRole_Sales_Executive         0.204512   0.116928   1.749 0.080284 .  
## divorced                       -1.176249   0.141858  -8.292  < 2e-16 ***
## married                        -0.897582   0.104802  -8.565  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3804.3  on 4299  degrees of freedom
## Residual deviance: 2955.1  on 4273  degrees of freedom
##   (110 observations deleted due to missingness)
## AIC: 3009.1
## 
## Number of Fisher Scoring iterations: 6

3 2.the Outcome of the Employee Attrition Analysis

3.1 a.What factors are contributing to the high attrition

My model suggests that the factors that contribute to the high attrition (at statistically significant levels) are:

  • avg_work_hours (average work hours per day in 2015)
  • EnvironmentSatisfaction (employee’s satisfaction with the work environment)
  • JobSatisfaction (employee’s satisfaction with their job)
  • MaritalStatus (employee’s marital status)
  • NumCompaniesWorked (the total number of companies that a employee has worked thus far)
  • BusinessTravel (the degree to which an employee travels for business purposes in 2015)
  • YearsWithCurrManager (the number of years an employee working under their current manager)
  • YearsSinceLAstPromotion (the number of years since an employee received their last promotion)
  • TotalWorkingYears (the total number of years an employee has worked thus far)
  • TrainingTimesLastYear (the number of training conducted for an employee in 2015)
  • Age (an employee’s age)
  • WorkLifeBalance (the degree to which an employee feels that there is a balance between work and life)
  • Department (the department an employee belongs to)

I visualize each factor’s overall association with attrition as follows.

3.1.1 a.1.Visualization of the association between attrition and each significant predictor of attrition

To better grasp the association between each factor and attrition, I create several plots for each factor (plots for raw counts, plots for percentage, plots after collapsing similiar categories).

3.1.1.1 a.1.1.Creating functions for ggplots to prevent writing repeated codes

#A function for a percentage graph (for categorical variables with fewer categories)

plot_attrition_percentage <- function(data, x_var, y_var) {
  percentages <- data %>% #calculating percentages first
    na.omit() %>%
    group_by({{x_var}}, {{y_var}}) %>%
    summarise(count = n(), .groups = "drop") %>%
    group_by({{x_var}}, .drop = TRUE) %>%
    mutate(percentage = count / sum(count) * 100)
  ggplot(percentages, aes(x = {{x_var}}, y = percentage, fill = {{y_var}})) +
    geom_bar(stat = "identity", position = "dodge", alpha = 0.5) +
    labs(x = deparse(substitute(x_var)), y = "Percentage", title = paste("Percentage of Employee", deparse(substitute(y_var)), "by", deparse(substitute(x_var)))) +
    scale_fill_manual(values = c("blue", "red"), labels = c("No Attrition", "Attrition")) +
    geom_text(aes(label = paste0(round(percentage), "%")), position = position_dodge(width = 0.9), vjust = -0.5, size = 3, color = "black") +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5, size = 13, face = "bold"),
      axis.title = element_text(size = 12),
      axis.text = element_text(size = 10),
      legend.title = element_blank()
    )
}

#A function for a count graph to objectively see different magnitudes of attrition in each category
plot_attrition_count <- function(data, x_var, y_var) {
  counts <- data %>%
    na.omit() %>%
    group_by({{x_var}}, {{y_var}}) %>%
    summarise(count = n(), .groups = "drop")

  ggplot(counts, aes(x = {{x_var}}, y = count, fill = {{y_var}})) +
    geom_bar(stat = "identity", position = "dodge", alpha = 0.5) +
    labs(x = deparse(substitute(x_var)), y = "Count", title = paste("Count of Employee", deparse(substitute(y_var)), "by", deparse(substitute(x_var)))) +
    scale_fill_manual(values = c("blue", "red"), labels = c("No Attrition", "Attrition")) +
    geom_text(aes(label = count), position = position_dodge(width = 0.9), vjust = -0.5, size = 3, color = "black") +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5, size = 13, face = "bold"),
      axis.title = element_text(size = 12),
      axis.text = element_text(size = 10),
      legend.title = element_blank()
    )
}

#A function for a count graph in which it does not show the raw number for each category
plot_attrition_count_no_number <- function(data, x_var, y_var) {
  counts <- data %>%
    na.omit() %>%
    group_by({{x_var}}, {{y_var}}) %>%
    summarise(count = n(), .groups = "drop")

  ggplot(counts, aes(x = {{x_var}}, y = count, fill = {{y_var}})) +
    geom_bar(stat = "identity", position = "dodge", alpha = 0.5) +
    labs(x = deparse(substitute(x_var)), y = "Count", title = paste("Count of Employee", deparse(substitute(y_var)), "by", deparse(substitute(x_var)))) +
    scale_fill_manual(values = c("blue", "red"), labels = c("No Attrition", "Attrition")) +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5, size = 13, face = "bold"),
      axis.title = element_text(size = 12),
      axis.text = element_text(size = 10),
      legend.title = element_blank()
    )
}

3.1.1.2 a.1.2.Employee Average Work Hours Per Day & Attrition

The plots below suggest that employees who work longer on average are more likely to leave the company.

#given that avg_work_hours is a continuous variable with decimals, let's create a density plot.
ggplot(employee_data_original_original, aes(x = avg_work_hours, fill = Attrition)) +
  geom_density(alpha = 0.5, color = "white") +  
  labs(x = "Average Work Hours per Day", y = "Density of Employees", title = "Employee Work Hours Distribution by Attrition", caption = "The sum of the areas for each color is equivalent to 1. \n Each shape represents the distribution of average employee work hours per day in the year 2015.") +
  scale_fill_manual(values = c("blue", "red"), labels = c("No Attrition", "Attrition")) +  
  theme_minimal() +
  geom_vline(xintercept = 8, linetype = "dashed", color = "black") +
  annotate("text", x = 8, y = 0.2, label = "8hr: Company Standard Work Hours", vjust = -0.5, hjust = 0.5, color = "black") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 13, face = "bold"),  
    axis.title = element_text(size = 12), 
    axis.text = element_text(size = 10), 
     plot.caption = element_text(hjust = 0.5, color = "#333333", margin = margin(t = 20)),
    legend.title = element_blank()       
  )

#to visualize a bar graph for non-technical audience, let's round the values in avg_work_hours 
employee_data_original_original <- employee_data_original_original %>% 
  mutate(rounded_avg_work_hours = round(avg_work_hours))

plot_attrition_count(employee_data_original_original, rounded_avg_work_hours, Attrition) +
  labs(title = "Number of Employee Attribution \n by Average Work Hours per day in the year 2015", x = "Average Work Hours Per Day (Rounded)", y = "Number of Employees")

plot_attrition_percentage(employee_data_original_original, rounded_avg_work_hours, Attrition) +
    labs(title = "Percentage of Employee Attribution \n by Average Work Hours per day in the year 2015", x = "Average Work Hours Per Day (Rounded)", y = "Percentage of Employees")

ordered_levels <- c("under 8 hours", "about 8 hours", "over 8 hours")
employee_data_original_original <- employee_data_original_original %>%
  mutate(average_work_hours_per_day = case_when(
    rounded_avg_work_hours < 8 ~ "under 8 hours",
    rounded_avg_work_hours == 8 ~ "about 8 hours",
    rounded_avg_work_hours > 8 ~ "over 8 hours"
  ) %>%
  factor(levels = ordered_levels))


plot_attrition_count(employee_data_original_original, average_work_hours_per_day, Attrition) +
  labs(title = "Number of Employee Attribution \n by Average Work Hours per day in the year 2015", x = "Average Work Hours Per Day (Rounded)", y = "Number of Employees")

plot_attrition_percentage(employee_data_original_original, average_work_hours_per_day, Attrition) +
    labs(title = "Percentage of Employee Attribution \n by Average Work Hours per day in the year 2015", x = "Average Work Hours Per Day (Rounded)", y = "Percentage of Employees")

As these graphs show, the percentage of attrition becomes much higher as the average work hours per day for an employee increases over 8 hours.

3.1.1.3 a.1.3.Employee’s Satisfaction with work environment & Attrition

#to make the visual easier to understand, let's change labels of 'EnvironmentSatisfaction' with the category names first. 

labels <- c("Low", "Medium", "High", "Very High") 
employee_data_original_original$EnvironmentSatisfaction <- factor(employee_data_original_original$EnvironmentSatisfaction, levels = 1:4, labels = labels)

plot_attrition_percentage(employee_data_original_original, EnvironmentSatisfaction, Attrition) +
  labs(title = "Percentage of Employee Attrition across\n Levels of Employee Satisfaction with Work Environment", y = "Percentage of Employees", x = "Perceived Work Environment Satisfaction")

plot_attrition_count(employee_data_original_original, EnvironmentSatisfaction, Attrition)  +
  labs(title = "Number of Employee Attrition across\n Levels of Employee Satisfaction with Work Environment", y = "Number of Employees", x = "Perceived Work Environment Satisfaction") 

#Given that 'EnvironmentSatisfaction' is skewed towards high satisfaction levels (high, very high) compared to low satisfaction levels, let's merge the 'high' and 'very high' categories to better understand the association between EnvironmentSatisfaction and Attrition.

employee_data_original_original <- employee_data_original_original %>%
  mutate(EnvironmentSatisfaction_fixed = recode_factor(EnvironmentSatisfaction, 
                                                "Low" = "Low", 
                                                "Medium" = "Medium", 
                                                "High" = "High", 
                                                "Very High" = "High"))

table(employee_data_original_original$EnvironmentSatisfaction_fixed)
## 
##    Low Medium   High 
##    845    856   2684
plot_attrition_percentage(employee_data_original_original, EnvironmentSatisfaction_fixed, Attrition) +
   labs(title = "Percentage of Employee Attrition across\n Levels of Employee Satisfaction with Work Environment", y = "Percentage of Employees", x = "Perceived Work Environment Satisfaction")

plot_attrition_count(employee_data_original_original, EnvironmentSatisfaction_fixed, Attrition) + 
    labs(title = "Number of Employee Attrition across\n Levels of Employee Satisfaction with Work Environment", y = "Number of Employees", x = "Perceived Work Environment Satisfaction") 

These plots suggest that employees are less likely to leave their company if they feel satisfied with their work environment.

3.1.1.4 a.1.4.Employee Job Satisfaction & Attrition

#to make the visual easier to understand, let's change labels of 'JobSatisfaction' with the category names first. 
employee_data_original_original$JobSatisfaction <- factor(employee_data_original$JobSatisfaction, levels = 1:4, labels = labels)
table(employee_data_original_original$JobSatisfaction)
## 
##       Low    Medium      High Very High 
##       860       840      1323      1367
plot_attrition_percentage(employee_data_original_original, JobSatisfaction, Attrition) + 
   labs(title = "Percentage of Employee Attrition across\n Levels of Employee Job Satisfaction", y = "Percentage of Employees", x = "Job Satisfaction")

plot_attrition_count(employee_data_original_original, JobSatisfaction, Attrition) +
   labs(title = "Number of Employee Attrition across\n Levels of Employee Job Satisfaction", y = "Number of Employees", x = "Job Satisfaction")

#given that 'JobSatisfaction' is skewed towards being highly satisfied (high, very high versus one 'low'), let's merge 'high' and 'very high' categories to better grasp the association between JobSatisfaction and Attrition.

employee_data_original_original <- employee_data_original_original %>%
  mutate(JobSatisfaction_fixed = recode_factor(JobSatisfaction, 
                                                "Low" = "Low", 
                                                "Medium" = "Medium", 
                                                "High" = "High", 
                                                "Very High" = "High"))

plot_attrition_percentage(employee_data_original_original, JobSatisfaction_fixed, Attrition)+ 
   labs(title = "Percentage of Employee Attrition across\n Levels of Employee Job Satisfaction", y = "Percentage of Employees", x = "Job Satisfaction")

plot_attrition_count(employee_data_original_original, JobSatisfaction_fixed, Attrition)+
   labs(title = "Number of Employee Attrition across\n Levels of Employee Job Satisfaction", y = "Number of Employees", x = "Job Satisfaction")

These plots suggest that employees are less likely to leave their company as they become satisfied with their job.

3.1.1.5 a.1.5.Employee Marital Status & Attrition

plot_attrition_percentage(employee_data_original_original, MaritalStatus, Attrition)+ 
   labs(title = "Percentage of Employee Attrition across\n Categories of Marital Status", y = "Percentage of Employees", x = "Marital Status")

plot_attrition_count(employee_data_original_original, MaritalStatus, Attrition)+ 
   labs(title = "Number of Employee Attrition across\n Categories of Marital Status", y = "Number of Employees", x = "Marital Status")

These plots suggest that employees who are single are more likely to leave their company compared to their married and divorced counterparts.

3.1.1.6 a.1.6.The number of companies employees have worked thus far & Attrition

#NumCompaniesWorked & Attrition : 

employee_data_original_original$NumCompaniesWorked <- as.factor(employee_data_original$NumCompaniesWorked)

str(employee_data_original_original$NumCompaniesWorked)
##  Factor w/ 10 levels "0","1","2","3",..: 2 1 2 4 5 4 3 3 1 2 ...
table(employee_data_original_original$NumCompaniesWorked)
## 
##    0    1    2    3    4    5    6    7    8    9 
##  586 1558  438  474  415  187  208  222  147  156
plot_attrition_count(employee_data_original_original, NumCompaniesWorked, Attrition) +
  labs(title = "Number of Attrition across \n Number of Companies Employees Worked Thus Far", y = "Number of employees"
       ,x = "Number of Companies")

plot_attrition_percentage(employee_data_original_original, NumCompaniesWorked, Attrition) +
  labs(title = "Percentage of Attrition across \n Number of Companies Employees Worked Thus Far", y = "Percentage of employees"
       ,x = "Number of Companies")

#given that there are too many categories, let's merge some categories together.
num_breaks <- c(-Inf, 1, 4, Inf) 
labels <- c("1 Company", "Up to 4 Companies", "More than 4 Companies")
employee_data_original_original <- employee_data_original_original %>%
  mutate(NumCompaniesWorked = as.numeric(NumCompaniesWorked)) %>%
  mutate(NumCompaniesWorked_c = cut(NumCompaniesWorked, breaks = num_breaks, labels = labels))

plot_attrition_count(employee_data_original_original, NumCompaniesWorked_c, Attrition) +
    labs(title = "Number of Attrition across \n Number of Companies Employees Worked Thus Far", y = "Number of employees"
       ,x = "Number of Companies")

plot_attrition_percentage(employee_data_original_original, NumCompaniesWorked_c, Attrition) +
    labs(title = "Percentage of Attrition across \n Number of Companies Employees Worked Thus Far", y = "Percentage of employees"
       ,x = "Number of Companies")

These plots suggest that, overall, as the number of companies that employees have worked increase, their likelihood of attrition also increases.

3.1.1.7 a.1.7.The frequency that an employee spends time for business purposes travel for work & Attrition

ordered_categories <- c("Non-Travel", "Travel_Rarely", "Travel_Frequently")
employee_data_original_original$BusinessTravel <- factor(employee_data_original_original$BusinessTravel, 
                                       levels = ordered_categories, 
                                       ordered = TRUE)


plot_attrition_count(employee_data_original_original, BusinessTravel, Attrition) +
    labs(title = "Number of Attrition by\n Frequency of Business Travel ", y = "Number of employees"
       ,x = "Number of Companies")

plot_attrition_percentage(employee_data_original_original, BusinessTravel, Attrition) +
      labs(title = "Percentage of Attrition by\n Frequency of Business Travel ", y = "Percentage of employees"
       ,x = "Number of Companies")

Employees who travel more frequently for work-related business trips are more likely to leave the company compared to their counterparts who take those trips less.

3.1.1.8 a.1.8.The number of years working under current manager & attrition

plot_attrition_count(employee_data_original_original, YearsWithCurrManager, Attrition) +
    labs(title = "Number of Attrition by\n Number of Years with Current Manager ", y = "Number of employees"
       ,x = "Number of Years with Current Manager")

plot_attrition_percentage(employee_data_original_original, YearsWithCurrManager, Attrition)+
    labs(title = "Percentage of Attrition by\n Number of Years with Current Manager ", y = "Percentage of employees"
       ,x = "Number of Years with Current Manager")

year_breaks <- c(-Inf, 0, 5, 10, Inf) 
employee_data_original_original <- employee_data_original_original %>%
  mutate(YearsWithCurrManager_category = cut(YearsWithCurrManager, 
                                             breaks = year_breaks, 
                                             labels = c("0 years", "1-5 years", "6-10 years", "Above 10 years")))
plot_attrition_count(employee_data_original_original, YearsWithCurrManager_category, Attrition) +
      labs(title = "Number of Attrition by\n Number of Years with Current Manager ", y = "Number of employees"
       ,x = "Number of Years with Current Manager")

plot_attrition_percentage(employee_data_original_original, YearsWithCurrManager_category, Attrition) +
    labs(title = "Percentage of Attrition by\n Number of Years with Current Manager ", y = "Percentage of employees"
       ,x = "Number of Years with Current Manager")

The attrition risk is the highest when employees have had only 0 years with their current manager. However, as the number of years with one’s current manager increases, the attrition risk also diminishes over time.

3.1.1.9 a.1.9.The number of years since an employee received their last promotion

plot_attrition_count(employee_data_original_original, YearsSinceLastPromotion, Attrition) +
      labs(title = "Overview of Employee Attrition by\n Number of Years Since Last Promotion ", y = "Number of employees"
       ,x = "Number of Years Since Last Promotion")

plot_attrition_count_no_number(employee_data_original_original, YearsSinceLastPromotion, Attrition) +
      labs(title = "Overview of Employee Attrition by\n Number of Years Since Last Promotion ", y = "Number of employees"
       ,x = "Number of Years Since Last Promotion")

plot_attrition_percentage(employee_data_original_original, YearsSinceLastPromotion, Attrition)  +
      labs(title = "Overview of Employee Attrition by\n Number of Years Since Last Promotion ", y = "Percentage of employees"
       ,x = "Number of Years Since Last Promotion")

#let's collapse the data into fewer categories to better make sense of the data
employee_data_original_original <- employee_data_original_original %>%
  mutate(SinceLastPromotion_category = cut(YearsSinceLastPromotion, 
                                             breaks = year_breaks, 
                                             labels = c("0 years", "1-5 years", "6-10 years", "Above 10 years")))
plot_attrition_count(employee_data_original_original, SinceLastPromotion_category, Attrition) +
      labs(title = "Overview of Employee Attrition by\n Number of Years Since Last Promotion ", y = "Number of employees"
       ,x = "Number of Years Since Last Promotion")

plot_attrition_percentage(employee_data_original_original, SinceLastPromotion_category, Attrition)  +
      labs(title = "Overview of Employee Attrition by\n Number of Years Since Last Promotion ", y = "Percentage of employees"
       ,x = "Number of Years Since Last Promotion")

#let's create a violin plot to better grasp the association

ggplot(employee_data_original_original, aes(x = Attrition, y = YearsSinceLastPromotion, fill = Attrition)) +
  geom_violin(color = "white", alpha = 0.5) +
  labs(x = "Attrition", y = "Years Since Last Promotion", title = "Association between Attrition and Years Since Last Promotion") +
  scale_fill_manual(values = c("No" = "blue", "Yes" = "red"), labels = c("No Attrition", "Attrition")) +  
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10),
    legend.title = element_blank()
  )

It is important to point out that this measure can be potentially misleading. Employees who have not received any promotion were also coded as 0 along with those who have received a promotion less than a year ago. Therefore, this measure should be corrected. However, considering that most employees are congregated around 0 to 5 years since receiving their last promotions, overall,there is a negative association between Years Since Last Promotion and attrition.

3.1.1.10 a.1.10.The number of total working years & attrition

plot_attrition_count(employee_data_original_original, TotalWorkingYears, Attrition) +
  labs(title = "Overview of Employee Attrition by\n Number of Working Years", y = "Number of employees"
       ,x = "Number of Working Years")

plot_attrition_count_no_number(employee_data_original_original, TotalWorkingYears, Attrition) +
  labs(title = "Overview of Employee Attrition by\n Number of Working Years", y = "Number of employees",
       x = "Number of Working Years")

employee_data_original_original <- employee_data_original_original %>%
  mutate(TotalWorkingYears_c = cut(TotalWorkingYears, 
                                             breaks = year_breaks, 
                                             labels = c("0 years", "1-5 years", "6-10 years", "Above 10 years")))
plot_attrition_count(employee_data_original_original, TotalWorkingYears_c, Attrition)  +
  labs(title = "Overview of Employee Attrition by\n Number of Working Years", y = "Number of employees"
       ,x = "Number of Working Years")

plot_attrition_percentage(employee_data_original_original, TotalWorkingYears_c, Attrition) +
  labs(title = "Overview of Employee Attrition by\n Number of Working Years", y = "Number of employees",
       x = "Number of Working Years")

The attrition risk is the highest when it is an employee’s first year working. But attrition risk diminishes as an employee’s total working years increase.

3.1.1.11 a.1.11.The number of training times an employee received last year & attrition

plot_attrition_count(employee_data_original_original, TrainingTimesLastYear, Attrition) +
    labs(title = "Overview of Employee Attrition by\n Number of Training Received in Last Year", y = "Number of employees",
       x = "Number of Training Received in Last Year")

plot_attrition_percentage(employee_data_original_original, TrainingTimesLastYear, Attrition) +
  labs(title = "Overview of Employee Attrition by\n Number of Training Received in Last Year", y = "Percentage of employees",
       x = "Number of Training Received in Last Year")

number_breaks<- c(-Inf, 0, 3, Inf) 
employee_data_original_original <- employee_data_original_original %>%
  mutate(TrainingTimesLastYear_category = cut(TrainingTimesLastYear, 
                                             breaks = number_breaks, 
                                             labels = c("0 times", "1-3 times", "Above 3 times")))

plot_attrition_count(employee_data_original_original, TrainingTimesLastYear_category, Attrition)+
    labs(title = "Overview of Employee Attrition by\n Number of Training Received in Last Year", y = "Number of employees",
       x = "Number of Training Received in Last Year")

plot_attrition_percentage(employee_data_original_original, TrainingTimesLastYear_category, Attrition)+
  labs(title = "Overview of Employee Attrition by\n Number of Training Received in Last Year", y = "Percentage of employees",
       x = "Number of Training Received in Last Year")

Overall, as employees receive more training, their attrition risk goes down. Perhaps this is related to having a growth opportunity and employees perceiving it as a positive sign.

3.1.1.12 a.1.12.Employee Age & attrition

plot_attrition_count(employee_data_original_original, Age, Attrition) +
  labs(title = "Overview of Employee Attrition by\n Employee Age", y = "Number of employees",
       x = "Age")

plot_attrition_count_no_number(employee_data_original_original, Age, Attrition) +
  labs(title = "Overview of Employee Attrition by\n Employee Age", y = "Number of employees",
       x = "Age")

plot_attrition_percentage(employee_data_original_original, Age, Attrition) +
  labs(title = "Overview of Employee Attrition by\n Employee Age", y = "Percentage of employees",
       x = "Age")

age_breaks <- c(-Inf, 30, 40, Inf)

employee_data_original_original <- employee_data_original_original %>%
  mutate(Age_category = cut(Age, breaks = age_breaks, labels = c("20s and below", "30s", "40s and above")))
plot_attrition_count(employee_data_original_original, Age_category, Attrition) +
  labs(title = "Overview of Employee Attrition by\n Employee Age", y = "Number of employees",
       x = "Age")

plot_attrition_percentage(employee_data_original_original, Age_category, Attrition) +
  labs(title = "Overview of Employee Attrition by\n Employee Age", y = "Percentage of employees",
       x = "Age")

Attrition risk is concerning for employees who are in their 20s overall. However, age and employees have a negative relationship overall. In other words, older employees are less likely to leave the company compared to their younger counterparts.

3.1.1.13 a.1.13.Employee’s work life balance & attrition

labels1 <- c("Bad", "Good", "Better", "Best") #change labels for WorkLifeBalance
employee_data_original_original$WorkLifeBalance <- factor(employee_data_original_original$WorkLifeBalance, levels = 1:4, labels = labels1)

plot_attrition_count(employee_data_original_original, WorkLifeBalance, Attrition) +
  labs(title = "Overview of Employee Attrition by\n Perceived Work Life Balance", y = "Number of employees",
       x = "Work Life Balance")

plot_attrition_percentage(employee_data_original_original, WorkLifeBalance, Attrition) +
  labs(title = "Overview of Employee Attrition by\n Perceived Work Life Balance", y = "Percentage of employees",
       x = "Work Life Balance")

Overall, employees who enjoy higher work life balance are less likely to leave the company than their counterparts who perceive that they have bad work and life balance. Overall, there is a negative association between work life balance and attrition.

3.1.1.14 a.1.14.Employee’s Department & Attrition

plot_attrition_count(employee_data_original_original, Department, Attrition) +
  labs(title = "Overview of Employee Attrition Across \n Departments", y = "Number of employees",
       x = "Department")

plot_attrition_percentage(employee_data_original_original, Department, Attrition) +
  labs(title = "Overview of Employee Attrition Across \n Departments", y = "Percentage of employees",
       x = "Department")

The Human resources Department has the highest attrition risk.

3.2 b.which variable is the most important and needs to be addressed straight away?

The bar chart below suggests that the most important variable responsible for employee attrition is avg_work_hours(the average work hours per day in 2015). According to the bar chart showing the association between average work hours per day and attrition, employees who work over 8 hours on average per day are more likely to leave the company then their counterparts who work less.

# Create summary table and filter significant variables
library(knitr)
library(broom)
library(dplyr)

summary_table <- tidy(new.step.model)
significant_vars <- summary_table %>%
  filter(p.value < 0.05) %>%
  arrange(p.value)

# Add a column for ranking the importance of each variable
significant_vars$Importance_Rank <- seq_along(significant_vars$p.value)

cat("Significant Predictors of Employee Attrition\n")
## Significant Predictors of Employee Attrition
#let's convert this into a graph to grasp it easier.
significant_vars_graph<- significant_vars %>%
  mutate(importance  = abs(statistic))

importance_barplot <- ggplot(significant_vars_graph, aes(x = reorder(term, importance), y = importance)) +
  geom_bar(stat = "identity", fill = "purple", alpha = 0.7) +
  labs(x = "Variables", y = "Absolute Value of Z Statistic", title = "Important Factors Predicting Employee Attrition", caption = "The absolute value of the Z statistic for each variable was calculated \n using a logistic regression model to predict employee attrition.") +
  theme_minimal(base_size = 12) + 
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1), 
        plot.title = element_text(hjust = 0.5, size = 16, face = "bold", margin = margin(b = 20)),
    plot.caption = element_text(hjust = 0.5, size = 8, color = "#333333"))
        
print(importance_barplot)

plot_attrition_percentage(employee_data_original_original, average_work_hours_per_day, Attrition) +
  labs(title = "Overview of Employee Attrition by \n Average Number of Work Hours Per day", y = "Percentage of employees",
       x = "Average Work Hours Per Day (in 2015)")

plot_attrition_count(employee_data_original_original, average_work_hours_per_day, Attrition) +
  labs(title = "Overview of Employee Attrition by \n Average Number of Work Hours Per day", y = "Number of employees",
       x = "Average Work Hours Per Day (in 2015)")

3.3 c.What changes the company should make to their workplace to support better retention?

While there are many factors that need to be addressed to improve retention, the most important thing that this company can do is to lower the average work hours for those who overworked in the previous year. This can be done by hiring more employees in the teams to which these employees belong. Additionally, the leaders should make it clear to mid-managers in these teams that employees working well over 8 hours should be discouraged. However, to gain deeper understanding, the company should interview and observe the teams to which these overworked employees belong to figure out what causes them to work well beyond 8 hours per day on average. After identifying these factors by collecting more data, better interventions can be designed and implemented.

4 3. My recommendations to improve the efficiency of the company’s data collection & analysis process as a data strategist

4.1 a.the need for additional data collection & measure development

To improve the company’s data collection & analysis process, there should be additional data collection and improvement in the analysis process.

4.1.0.1 a.1.Longtiduinal data (over several years).

The job market shifts constantly due to the economy, development of new technologies, or unpredictable situations such as the COVID-19 pandemic. Therefore, to better predict factors that drive employee attrition, collecting longitudinal data over several years might be beneficial. By doing so, it is possible to isolate seasonal effects from the association between a predictor and attrition. Currently, there’s only 1 year long employee data available for analysis.

4.1.0.2 a.2.Exit Survey + Interview

It might be foolish to assume that the variables that drive employee attrition stay the same forever. For example, in Japan, the generation who grew up during the recent 3 decades-long economic downturn have different priorities in life compared to their older generation who grew up when Japan’s economic prosperity was competing with the United States.These individuals in general do not strive to become wealthy or successful in terms of their career compared to their older generation. Therefore, conducting an exit survey or interview and examining the presence of common themes in these interviews might help the company understand new factors that drive employee attrition and develop new metrics, using a psychometrics approach.

4.1.0.3 a.3.Team level metrics

In the data tables I’ve received to analyze, there were not many team-level metrics such as leader behaviors, team goal interdependence, peer rating. The current workplace research suggests that various team level metrics, such as their manager’s leadership style, team psychological safety, significantly influence employees’ decision to leave their companies. By conducting a literature review first, the company should include team level metrics that have been scientifically validated to predict attrition to their annual survey to improve their ability to predict attrition and use this information to identify areas of improvement at team level. By doing so, the company might be able to improve retention.

4.2 b.Adjusting current scales for several measures

In addition to collecting more data, the current metrics (measurements)’ scales should be improved and they need to be defined more precisely.

4.2.0.1 b.1.Improving the clarity of measurement (metric)

  • ‘EnvironmentSatisfaction’ variable indicates whether an employee finds their workplace satisfactory. However, work environment is ambiguous. To some employees, it could mean their direct teams. For some others, they could interpret ‘work environment’ as their office space. Therefore, this metric needs to be defined more precisely. For example, we can measure team members satisfaction & the office environment satisfaction.

4.2.0.2 b.2.Improving the existing scales in data

  • ‘YearsSinceLastPromotion’ is supposed to measure “number of years since last promotion”. However, this variable does not have an option to indicate whether an employee has never received a promotion or not. In this case, it becomes impossible to disentangle those who have never received a promotion from those who have received a promotion less than a year ago. Therefore, this variable has a critical error in measuring “number of years since last promotion. At least, it should give an option to indicate that one has never received a promotion.

  • ‘WorkLifeBalance’s scale is also skewed towards positive work life balance. while it has only 1 option for negative perceptions of work life balance (’bad’), it has 3 for positive work life balance (‘good’, ‘better’, ‘best’). Therefore, it is difficult to accurately measure employee’s negative perceptions of work life balance.It is losing perhaps critical data points by collapsing them into one category.